Introduction to Molecular Genetics and Genomics

132
C H A P T E R Introduction to Molecular Genetics and Genomics 1

Transcript of Introduction to Molecular Genetics and Genomics

Page 1: Introduction to Molecular Genetics and Genomics

C H A P T E R

Introduction to MolecularGenetics andGenomics

1

Page 2: Introduction to Molecular Genetics and Genomics

1

• Inherited traits are affected by genes.

• Genes are composed of the chemical deoxyribonucleic acid(DNA).

• DNA replicates to form copies of itself that are identical(except for rare mutations).

• DNA contains a genetic code specifying what types ofenzymes and other proteins are made in cells.

• DNA occasionally mutates, and the mutant forms specifyaltered proteins that have reduced activity or stability.

• A mutant enzyme is an “inborn error of metabolism” thatblocks one step in a biochemical pathway for the metabolismof small molecules.

• Traits are affected by environment as well as by genes.

• Organisms change genetically through generations in theprocess of biological evolution.

Shear MadnessAlfred D. Hershey and Martha Chase 1952Independent Functions of Viral Protein and Nucleic Acid in Growth of Bacteriophage

The Black Urine DiseaseArchibald E. Garrod 1908Inborn Errors of Metabolism

P R I N C I P L E S

C O N N E C T I O N S

C H A P T E R O U T L I N E1.1 DNA: The Genetic Material

Experimental Proof of the Genetic Function of DNA

Genetic Role of DNA in Bacteriophage

1.2 DNA Structure: The Double Helix

1.3 An Overview of DNA Replication

1.4 Genes and ProteinsInborn Errors of Metabolism as a

Cause of Hereditary DiseaseMutant Genes and Defective Proteins

1.5 Gene Expression: The Central DogmaTranscriptionTranslationThe Genetic Code

1.6 MutationProtein Folding and Stability

1.7 Genes and Environment

1.8 Evolution: From Genes to Genomes, fromProteins to ProteomesThe Molecular Unity of LifeNatural Selection and Diversity

Page 3: Introduction to Molecular Genetics and Genomics

Each species of living organism has aunique set of inherited characteristicsthat makes it different from other

species. Each species has its own develop-mental plan—often described as a sort of“blueprint” for building the organism—which is encoded in the DNA molecules pre-sent in its cells. This developmental plandetermines the characteristics that are in-herited. Because organisms in the samespecies share the same developmental plan,organisms that are members of the samespecies usually resemble one another, al-though some notable exceptions usually aredifferences between males and females. Forexample, it is easy to distinguish a humanbeing from a chimpanzee or a gorilla. A hu-man being habitually stands upright and haslong legs, relatively little body hair, a largebrain, and a flat face with a prominent nose,jutting chin, distinct lips, and small teeth.All of these traits are inherited—part of ourdevelopmental plan—and help set us apartas Homo sapiens.

But human beings are by no meansidentical. Many traits, or observable charac-teristics, differ from one person to another.There is a great deal of variation in haircolor, eye color, skin color, height, weight,personality traits, and other characteristics.Some human traits are transmitted biologi-cally, others culturally. The color of oureyes results from biological inheritance, butthe native language we learned as a childresults from cultural inheritance. Manytraits are influenced jointly by biological in-heritance and environmental factors. Forexample, weight is determined in part byinheritance but also in part by environ-ment: how much food we eat, its nutri-tional content, our exercise regimen, and soforth. Genetics is the study of biologicallyinherited traits, including traits that are in-fluenced in part by the environment.

The fundamental concept of genetics isthat:

Inherited traits are determined by the ele-ments of heredity that are transmitted fromparents to offspring in reproduction; theseelements of heredity are called genes.

The existence of genes and the rulesgoverning their transmission from gen-eration to generation were first articulatedby Gregor Mendel in 1866 (Chapter 3).Mendel’s formulation of inheritance was in

terms of the abstract rules by which heredi-tary elements (he called them “factors”) aretransmitted from parents to offspring. Hisobjects of study were garden peas, withvariable traits like pea color and plantheight. At one time genetics could be stud-ied only through the progeny producedfrom matings. Genetic differences betweenspecies were impossible to define, becauseorganisms of different species usually do notmate, or they produce hybrid progeny thatdie or are sterile. This approach to the studyof genetics is often referred to as classical ge-netics, or organismic or morphological ge-netics. Given the advances of molecular, ormodern, genetics, it is possible to study dif-ferences between species through the com-parison and analysis of the DNA itself. Thereis no fundamental distinction between clas-sical and molecular genetics. They are dif-ferent and complementary ways of studyingthe same thing: the function of the geneticmaterial. In this book we include many ex-amples showing how molecular and classi-cal genetics can be used in combination toenhance the power of genetic analysis.

The foundation of genetics as a molecu-lar science dates back to 1869, just threeyears after Mendel reported his exper-iments. It was in 1869 that FriedrichMiescher discovered a new type of weakacid, abundant in the nuclei of white bloodcells. Miescher’s weak acid turned out to bethe chemical substance we now call DNA(deoxyribonucleic acid). For many yearsthe biological function of DNA was un-known, and no role in heredity was as-cribed to it. This first section shows howDNA was eventually isolated and identifiedas the material that genes are made of.

1.1 DNA: The Genetic MaterialThat the cell nucleus plays a key role in in-heritance was recognized in the 1870s bythe observation that the nuclei of male andfemale reproductive cells undergo fusion inthe process of fertilization. Soon thereafter,chromosomes were first observed insidethe nucleus as thread-like objects thatbecome visible in the light microscopewhen the cell is stained with certain dyes.Chromosomes were found to exhibit acharacteristic “splitting” behavior in whicheach daughter cell formed by cell division

2 Chapter 1 Introduction to Molecular Genetics and Genomics

Page 4: Introduction to Molecular Genetics and Genomics

receives an identical complement of chro-mosomes (Chapter 4). Further evidence forthe importance of chromosomes was pro-vided by the observation that, whereas thenumber of chromosomes in each cell maydiffer among biological species, the numberof chromosomes is nearly always constantwithin the cells of any particular species.These features of chromosomes were wellunderstood by about 1900, and they madeit seem likely that chromosomes were thecarriers of the genes.

By the 1920s, several lines of indirectevidence began to suggest a close relation-ship between chromosomes and DNA.Microscopic studies with special stainsshowed that DNA is present in chromo-somes. Chromosomes also contain varioustypes of proteins, but the amount and kindsof chromosomal proteins differ greatly fromone cell type to another, whereas theamount of DNA per cell is constant.Furthermore, nearly all of the DNA presentin cells of higher organisms is present in thechromosomes. These arguments for DNA asthe genetic material were unconvincing,however, because crude chemical analyseshad suggested (erroneously, as it turnedout) that DNA lacks the chemical diversityneeded in a genetic substance. The favoredcandidate for the genetic material was pro-tein, because proteins were known to be anexceedingly diverse collection of molecules.Proteins therefore became widely accepted

as the genetic material, and DNA was as-sumed to function merely as the structuralframework of the chromosomes. The ex-periments described below finally demon-strated that DNA is the genetic material.

Experimental Proof of the GeneticFunction of DNAAn important first step was taken byFrederick Griffith in 1928 when he demon-strated that a physical trait can be passedfrom one cell to another. He was workingwith two strains of the bacteriumStreptococcus pneumoniae identified as S andR. When a bacterial cell is grown on solidmedium, it undergoes repeated cell divi-sions to form a visible clump of cells called acolony. The S type of S. pneumoniae synthe-sizes a gelatinous capsule composed ofcomplex carbohydrate (polysaccharide).The enveloping capsule makes each colonylarge and gives it a glistening or smooth (S)appearance. This capsule also enables thebacterium to cause pneumonia by protect-ing it from the defense mechanisms of aninfected animal. The R strains of S. pneumo-niae are unable to synthesize the capsularpolysaccharide; they form small coloniesthat have a rough (R) surface (Figure 1.1).This strain of the bacterium does not causepneumonia, because without the capsulethe bacteria are inactivated by the immunesystem of the host. Both types of bacteria

1.1 DNA: The Genetic Material 3

FPO

S strainR strain

Figure 1.1 Colonies of rough (R, the small colonies) and smooth (S, the large colonies) strains ofStreptococcus pneumoniae. The S colonies are larger because of the gelatinous capsule on the S cells.[Photograph from O. T. Avery, C. M. MacLeod, and M. McCarty. Reproduced from the Journal ofExperimental Medicine, 1944, vol. 79, p. 137 by copyright permission of The Rockefeller UniversityPress.]

Page 5: Introduction to Molecular Genetics and Genomics

“breed true” in the sense that the progenyformed by cell division have the capsulartype of the parent, either S or R.

Mice injected with living S cells getpneumonia. Mice injected either with livingR cells or with heat-killed S cells remainhealthy. Here is Griffith’s critical finding:mice injected with a mixture of living R cellsand heat-killed S cells contract the disease—they often die of pneumonia (Figure 1.2).Bacteria isolated from blood samples ofthese dead mice produce S cultures with acapsule typical of the injected S cells, eventhough the injected S cells had been killedby heat. Evidently, the injected materialfrom the dead S cells includes a substancethat can be transferred to living R cells andconfer the ability to resist the immunologi-cal system of the mouse and cause pneumo-nia. In other words, the R bacteria can bechanged—or undergo transformation—into S bacteria. Furthermore, the new char-acteristics are inherited by descendants ofthe transformed bacteria.

Transformation in Streptococcus was orig-inally discovered in 1928, but it was not

until 1944 that the chemical substance re-sponsible for changing the R cells into Scells was identified. In a milestone experi-ment, Oswald Avery, Colin MacLeod, andMaclyn McCarty showed that the sub-stance causing the transformation of R cellsinto S cells was DNA. In doing these exper-iments, they first had to develop chemicalprocedures for isolating almost pure DNAfrom cells, which had never been done be-fore. When they added DNA isolated fromS cells to growing cultures of R cells, theyobserved transformation: A few cells oftype S cells were produced. Although theDNA preparations contained traces of pro-tein and RNA (ribonucleic acid, an abun-dant cellular macromolecule chemicallyrelated to DNA), the transforming activitywas not altered by treatments that de-stroyed either protein or RNA. However,treatments that destroyed DNA eliminatedthe transforming activity (Figure 1.3). Theseexperiments implied that the substance re-sponsible for genetic transformation wasthe DNA of the cell—hence that DNA is thegenetic material.

4 Chapter 1 Introduction to Molecular Genetics and Genomics

LivingS cells

LivingR cells

Heat-killedS cells

Living R cells plusheat-killed S cells

Mouse contractspneumonia

Mouse contractspneumonia

Mouse remainshealthy

Mouse remainshealthy

S colonies isolatedfrom tissue of dead mouse

R and S colonies isolatedfrom tissue of dead mouse

R colonies isolatedfrom tissue

No colonies isolatedfrom tissue

Figure 1.2 The Griffith's experiment demonstrating bacterial transformation. A mouse remainshealthy if injected with either the nonvirulent R strain of S. pneumoniae or heat-killed cell fragmentsof the usually virulent S strain. R cells in the presence of heat-killed S cells are transformed into thevirulent S strain, causing pneumonia in the mouse.

Page 6: Introduction to Molecular Genetics and Genomics

1.1 DNA: The Genetic Material 5

Culture of S cells

S cell extract

Protease or RNase

Culture of R cells

Cells killed by heat

S cell extract(contains mostlyDNA with a littleprotein and RNA) Culture of R cells

R colonies anda few S colonies

R colonies anda few S colonies

(A) The transforming activity in S cells is not destroyed by heat.

(B) The transforming activity is not destroyed by either protease or RNase.

R colonies only

(C) The transforming activity is destroyed by DNase.

DNase

Plate on agar medium

Plate on agar medium

Plate on agar medium

S cell extract

Culture of R cells

Conclusion: Transforming activity most likely DNA

Conclusion: Transforming activity not protein or RNA

Figure 1.3 A diagram of the Avery–MacLeod–McCarty experiment thatdemonstrated that DNA is the active material in bacterial transformation. (A) Purified DNA extracted from heat-killed S cells can convert some living R cells into S cells, but the material may still contain undetectable traces ofprotein and/or RNA. (B) The transforming activity is not destroyed by eitherprotease or RNase. (C) The transforming activity is destroyed by DNase and soprobably consists of DNA.

Page 7: Introduction to Molecular Genetics and Genomics

Genetic Role of DNA in BacteriophageA second pivotal finding was reported byAlfred Hershey and Martha Chase in 1952.They studied cells of the intestinalbacterium Escherichia coli after infection bythe virus T2. A virus that attacks bacterialcells is called a bacteriophage, a term of-ten shortened to phage. Bacteriophagemeans “bacteria-eater.” The structure of abacteriophage T2 particle is illustrated inFigure 1.4. It is exceedingly small, yet it has acomplex structure composed of head(which contains the phage DNA), collar,tail, and tail fibers. (The head of a humansperm is about 30–50 times larger in bothlength and width than the head of T2.)Hershey and Chase were already awarethat T2 infection proceeds via the attach-ment of a phage particle by the tip of its tailto the bacterial cell wall, entry of phage ma-terial into the cell, multiplication of thismaterial to form a hundred or more prog-eny phage, and release of the progenyphage by bursting (lysis) of the bacterialhost cell. They also knew that T2 particleswere composed of DNA and protein in ap-proximately equal amounts.

Because DNA contains phosphorus butno sulfur, whereas most proteins containsulfur but no phosphorus, it is possible to la-bel DNA and proteins differentially by using

6 Chapter 1 Introduction to Molecular Genetics and Genomics

(A) (B)

Protein

DNA

Head(proteinand DNA)

Tail(proteinonly)

Figure 1.4 (A) Drawing of E. coli phage T2, showing various components.The DNA is confined to the interior of the head. (B) An electron micro-graph of phage T4, a closely related phage. [Electron micrograph courtesyof Robley Williams.]

Figure 1.5 (on facing page) The Hershey–Chase(“blender”) experiment demonstrating thatDNA, not protein, is responsible for directingthe reproduction of phage T2 in infected E. colicells. (A) Radioactive DNA is transmitted toprogeny phage in substantial amounts. (B) Radioactive protein is transmitted toprogeny phage in negligible amounts.

radioactive isotopes of the two elements.Hershey and Chase produced particles con-taining radioactive DNA by infecting E. colicells that had been grown for several gen-erations in a medium that included 32P (aradioactive isotope of phosphorus) and thencollecting the phage progeny. Other parti-cles containing labeled proteins were ob-tained in the same way, by using mediumthat included 35S (a radioactive isotope ofsulfur).

In the experiments summarized in Figure1.5, nonradioactive E. coli cells were infectedwith phage labeled with either 32P (part A)or 35S (part B) in order to follow the DNAand proteins individually. Infected cellswere separated from unattached phage par-ticles by centrifugation, resuspended infresh medium, and then swirled violently ina kitchen blender to shear attached phagematerial from the cell surfaces. This treat-ment was found to have no effect on thesubsequent course of the infection, whichimplies that the phage genetic materialmust enter the infected cells very soon afterphage attachment. The kitchen blenderturned out to be the critical piece of equip-ment. Other methods had been tried to tearthe phage heads from the bacterial cell sur-face, but nothing had worked reliably.Hershey later explained, “We tried variousgrinding arrangements, with results thatweren’t very encouraging. When MargaretMcDonald loaned us her kitchen blender,the experiment promptly succeeded.”

After the phage heads were removed bythe blender treatment, the infected bacteriawere examined. Most of the radioactivityfrom 32P-labeled phage was found to be as-sociated with the bacteria, whereas only asmall fraction of the 35S radioactivity waspresent in the infected cells. The retentionof most of the labeled DNA, contrasted withthe loss of most of the labeled protein, im-plied that a T2 phage transfers most of itsDNA, but very little of its protein, to the cellit infects. The critical finding (Figure 1.5)

Page 8: Introduction to Molecular Genetics and Genomics

1.1 DNA: The Genetic Material 7

Infection withnonradioactive T2 phage

Infection withnonradioactive T2 phage

E. coli cells grown

in 32P-containing

medium (labels DNA)

E. coli cells grown

in 35S-containing

medium (labels protein)

Phage reproduction; cell lysis releases DNA-labeled progeny phage

Phage reproduction;cell lysis releases protein-labeled progeny phage

DNA-labeled phage used to infect nonradioactive cells

Protein-labeled phage used to infect nonradioactive cells

Infected cell Infected cell

Phage reproduction; cell lysisreleases progeny phage that contain some 32P-labeled DNAfrom the parental phage DNA

Phage reproduction; cell lysis releases progeny phage that contain almost no 35S-labeled protein

Conclusion: DNA from an infecting parental phage is inherited in the progeny phage

After infection, part of phageremaining attached to cells isremoved by violent agitationin a kitchen blender

After infection, part of phageremaining attached to cells isremoved by violent agitationin a kitchen blender

Infectinglabeled DNA

Infectingnonlabeled DNA

(A) (B)

Page 9: Introduction to Molecular Genetics and Genomics

was that about 50 percent of the transferred32P-labeled DNA, but less than 1 percent ofthe transferred 35S-labeled protein, was in-herited by the progeny phage particles.Hershey and Chase interpreted this result tomean that the genetic material in T2 phageis DNA.

The experiments of Avery, MacLeod,and McCarty and those of Hershey and

Chase are regarded as classics in the demon-stration that genes consist of DNA. At thepresent time, the equivalent of the transfor-mation experiment is carried out daily inmany research laboratories throughout theworld, usually with bacteria, yeast, or ani-mal or plant cells grown in culture. Theseexperiments indicate that DNA is the ge-netic material in these organisms as well as

8 Chapter 1 Introduction to Molecular Genetics and Genomics

Shear Madness

Alfred D. Hershey and Martha Chase 1952Cold Spring Harbor Laboratories, Cold Spring Harbor, New YorkIndependent Functions of Viral Proteinand Nucleic Acid in Growth ofBacteriophage

Published a full eight years after the paperof Avery, MacLeod, and McCarty, the ex-periments of Hershey and Chase get equalbilling. Why? Some historians of sciencesuggest that the Avery et al. experimentswere “ahead of their time.” Others sug-gest that Hershey had special standing be-cause he was a member of the “in group”of phage molecular geneticists. MaxDelbrück was the acknowledged leader ofthis group, with Salvador Luria close be-hind. (Delbrück, Luria, and Hersheyshared a 1969 Nobel Prize.) Another pos-sible reason is that whereas the experi-ments of Avery et al. were feats of strengthin biochemistry, those of Hershey andChase were quintessentially genetic.Which macromolecule gets into the hered-itary action, and which does not? Buriedin the middle of this paper, and retainedin the excerpt, is a sentence admittingthat an earlier publication by the re-searchers was a misinterpretation of theirpreliminary results. This shows that evenfirst-rate scientists, then and now, aresometimes misled by their preliminarydata. Hershey later explained, “We triedvarious grinding arrangements, with re-sults that weren´t very encouraging. When

Margaret McDonald loaned us herkitchen blender the experiment promptlysucceeded.”

The work [of others] has shown thatbacteriophages T2, T3,and T4 multiply in thebacterial cell in a non-in-fective [immature] form.Little else is known aboutthe vegetative [growth]phase of these viruses.The experiments reportedin this paper show thatone of the first steps in thegrowth of T2 is the releasefrom its protein coat ofthe nucleic acid of the virus particle,after which the bulk of the sulfur-con-taining protein has no further func-tion. . . . Anderson has obtainedelectron micrographs indicating thatphage T2 attaches to bacteria by itstail. . . . It ought to be a simple matter tobreak the empty phage coats off the in-fected bacteria, leaving the phage DNAinside the cells. . . . When a suspensionof cells with 35S- or 32P-labeled phagewas spun in a blender at 10,000 revolu-tions per minute, . . . 75 to 80 percent ofthe phage sulfur can be stripped fromthe infected cells. . . . These facts showthat the bulk of the phage sulfur re-mains at the cell surface during infec-tion. . . . Little or no 35S is contained inthe mature phage progeny. . . . Identicalexperiments starting with phage labeledwith 32P show that phosphorus is trans-

ferred from parental to progeny phageat yields of about 30 phage per infectedbacterium. . . . [Incomplete separationof phage heads] explains a mistakenpreliminary report of the transfer of 35S

from parental to progenyphage. . . . The followingquestions remain unan-swered. (1) Does any sul-fur-free phage materialother than DNA enter thecell? (2) If so, is it trans-ferred to the phage prog-eny? (3) Is the transfer ofphosphorus to progenydirect or indirect? . . . Ourexperiments show clearly

that a physical separation of the phageT2 into genetic and nongenetic parts ispossible. The chemical identification ofthe genetic part must wait until some ofthe questions above have been an-swered. . . . The sulfur-containingprotein of resting phage particles is con-fined to a protective coat that is respon-sible for the adsorption to bacteria, andfunctions as an instrument for the injec-tion of the phage DNA into the cell. Thisprotein probably has no function in thegrowth of the intracellular phage. TheDNA has some function. Further chemi-cal inferences should not be drawn fromthe experiments presented.

Source: Journal of General Physiology 36:39–56

Our experimentsshow clearly that

a physicalseparation of the

phage T2 intogenetic and

nongenetic parts is possible.

Page 10: Introduction to Molecular Genetics and Genomics

in phage T2. Although there are no knownexceptions to the generalization that DNA isthe genetic material in all cellular organismsand many viruses, in a few types of virusesthe genetic material consists of RNA.

1.2 DNA Structure: The Double Helix

The inference that DNA is the genetic mate-rial still left many questions unanswered.How is the DNA in a gene duplicated whena cell divides? How does the DNA in a genecontrol a hereditary trait? What happens tothe DNA when a mutation (a change in theDNA) takes place in a gene? In the early1950s, a number of researchers began to tryto understand the detailed molecular struc-ture of DNA in hopes that the structurealone would suggest answers to these ques-tions. In 1953 James Watson and FrancisCrick at Cambridge University proposed thefirst essentially correct three-dimensionalstructure of the DNA molecule. The struc-ture was dazzling in its elegance and revo-lutionary in suggesting how DNA duplicatesitself, controls hereditary traits, and under-goes mutation. Even while their tin-and-wire model of the DNA molecule was stillincomplete, Crick would visit his favoritepub and exclaim “we have discovered thesecret of life.”

In the Watson–Crick structure, DNAconsists of two long chains of subunits, eachtwisted around the other to form a double-stranded helix. The double helix is right-handed, which means that as one looksalong the barrel, each chain follows a clock-wise path as it progresses. You can visualizethe right-handed coiling in part A of Figure1.6 if you imagine yourself looking up intothe structure from the bottom. The darkspheres outline the “backbone” of each in-dividual strand, and they coil in a clockwisedirection. The subunits of each strand arenucleotides, each of which contains anyone of four chemical constituents calledbases attached to a phosphorylated mole-cule of the 5-carbon sugar deoxyribose.The four bases in DNA are

• Adenine (A) • Guanine (G)• Thymine (T) • Cytosine (C)

The chemical structures of the nucleotidesand bases need not concern us at this time.

They are examined in Chapter 2. A keypoint for our present purposes is that thebases in the double helix are paired asshown in Figure 1.6B. That is:

At any position on the paired strands of a DNAmolecule, if one strand has an A, then thepartner strand has a T; and if one strand has aG, then the partner strand has a C.

The pairing between A and T and be-tween G and C is said to be comple-mentary; the complement of A is T, andthe complement of G is C. The complemen-tary pairing means that each base along onestrand of the DNA is matched with a base inthe opposite position on the other strand.Furthermore:

Nothing restricts the sequence of bases in asingle strand, so any sequence could bepresent along one strand.

This principle explains how only four basesin DNA can code for the huge amount of in-formation needed to make an organism. It

1.2 DNA Structure: The Double Helix 9

(B)

T

T

A

A

G C

C GC G

T AG C

G C

5’3’

T A

A TG C

T

T

A

A

G C

A T

T A

G C

C G

5’ 3’

Pairednucleotides

(A)

Figure 1.6 Molecular structure of the DNA double helix in the standard “B form.” (A) A space-filling model, in which each atom is depicted as asphere. (B) A diagram highlighting the helical strands around the outsideof the molecule and the A�T and G�C base pairs inside.

Page 11: Introduction to Molecular Genetics and Genomics

is the sequence of bases along the DNA thatencodes the genetic information, and thesequence is completely unrestricted.

The complementary pairing is also calledWatson–Crick pairing. In the three-dimensional structure in Figure 1.6A, thebase pairs are represented by the lighterspheres filling the interior of the doublehelix. The base pairs lie almost flat, stackedon top of one another perpendicular to thelong axis of the double helix, like pennies ina roll. When discussing a DNA molecule,biologists frequently refer to the individualstrands as single-stranded DNA and tothe double helix as double-strandedDNA or duplex DNA.

Each DNA strand has a polarity, or di-rectionality, like a chain of circus elephantslinked trunk to tail. In this analogy, eachelephant corresponds to one nucleotidealong the DNA strand. The polarity is deter-mined by the direction in which the nu-cleotides are pointing. The “trunk” end ofthe strand is called the 3' end of the strand,and the “tail” end is called the 5' end. Indouble-stranded DNA, the paired strandsare oriented in opposite directions, the 5'end of one strand aligned with the 3' end ofthe other. The molecular basis of the polar-ity, and the reason for the opposite orienta-tion of the strands in duplex DNA, areexplained in Chapter 2. In illustrating DNAmolecules in this book, we use an arrow-like ribbon to represent the backbone, andwe use tabs jutting off the ribbon to repre-sent the nucleotides. The polarity of a DNAstrand is indicated by the direction of thearrow-like ribbon. The tail of the arrow rep-resents the 5' end of the DNA strand, thehead the 3' end.

Beyond the most optimistic hopes,knowledge of the structure of DNA imme-diately gave clues to its function:

1. The sequence of bases in DNA could becopied by using each of the separate“partner” strands as a pattern for thecreation of a new partner strand with acomplementary sequence of bases.

2. The DNA could contain genetic informationin coded form in the sequence of bases,analogous to letters printed on a strip ofpaper.

3. Changes in genetic information (mutations)could result from errors in copying in whichthe base sequence of the DNA becamealtered.

In the remainder of this chapter, we discusssome of the implications of these clues.

1.3 An Overview of DNA Replication

Watson and Crick noted that the structureof DNA itself suggested a mechanism for itsreplication. “It has not escaped our notice,”they wrote, “that the specific base pairingwe have postulated immediately suggests acopying mechanism.” The copying processin which a single DNA molecule becomestwo identical molecules is called repli-cation. The replication mechanism thatWatson and Crick had in mind is illustratedin Figure 1.7.

As shown in part A of Figure 1.7, thestrands of the original (parent) duplex sep-arate, and each individual strand serves as apattern, or template, for the synthesis of anew strand (replica). The replica strands aresynthesized by the addition of successivenucleotides in such a way that each base inthe replica is complementary (in theWatson–Crick pairing sense) to the baseacross the way in the template strand(Figure 1.7B). Although the mechanism inFigure 1.7 is simple in principle, it is a com-plex process that is fraught with geometri-cal problems and requires a variety ofenzymes and other proteins. The details areexamined in Chapter 6. The end result ofreplication is that a single double-strandedmolecule becomes replicated into twocopies with identical sequences:

5'-ACGCTTGC-3'3'-TGCGAACG-5'

5'-ACGCTTGC-3' 5'-ACGCTTGC-3'3'-TGCGAACG-5' 3'-TGCGAACG-5'

Here the bases in the newly synthesizedstrands are shown in red. In the duplex onthe left, the top strand is the template fromthe parental molecule and the bottomstrand is newly synthesized; in the duplexon the right, the bottom strand is the tem-plate from the parental molecule and thetop strand is newly synthesized. Note inFigure 1.7B that in the synthesis of eachnew strand, new nucleotides are addedonly to the 3' end of the growing chain:

The obligatory elongation of a DNA strandonly at the 3' end is an essential feature ofDNA replication.

10 Chapter 1 Introduction to Molecular Genetics and Genomics

Page 12: Introduction to Molecular Genetics and Genomics

1.4 Genes and ProteinsNow that we have some basic understand-ing of the structural makeup of the geneticblueprint, how does this developmentalplan become a complex living organism? Ifthe code is thought of as a string of letterson a sheet of paper, then the genes aremade up of distinct words that form sen-tences and paragraphs that give meaning tothe pattern of letters. What is created from

the complex and diverse DNA codes is pro-tein, a class of macromolecules that carriesout most of the activities in the cell. Cellsare largely made up of proteins: structuralproteins that give the cell rigidity and mo-bility, proteins that form pores in the cellmembrane to control the traffic of smallmolecules into and out of the cell, and re-ceptor proteins that regulate cellular activi-ties in response to molecular signals fromthe growth medium or from other cells.

1.4 Genes and Proteins 11

A

Parent duplex

Daughter

duplex

Replicastrands

(A)

T

T

A

A

G C

C GC G

T AC G

C G

T A

A TC G

A T

T

T A

T

T

A

A

G C

C G

C

T AC G

C G

A T

T A

T

T

A

A

G CC G

G

T A

C G

C G

A TT A

Templatestrands

5’ 3’

5’

A C G C T T G CG

5’

3’ 5’

AT G C G A A C G

5’ 3’

A C G C T T G C

G

A

A C GT G C G A A C GA C G

5’

3’ 5’

A C G C T T G C

5’ 3’

5’

Template strand

Parent molecule of DNA

Template strand

Complement of“T” adds “A”

Complement of“G” adds “C”

Complement of“G” adds “C”

And so forth

Daughter molecules of DNA

Complement of“C” adds “G”

(B)

5’ 3’

3’ 5’

A C G CT G C G

T T G CA A C G

A C G CT G C G

T T G CA A C G

5’ 3’

3’ 5’

A C G CT G C G

T T G CA A C G

5’ 3’

3’ 5’

3’ 5’

T G C G A A C G

C

C

Figure 1.7 Replication of DNA. (A) Replication of a DNA duplex as originally envisioned by Watsonand Crick. As the parental strands separate, each parental strand serves as a template for the formation of a new daughter strand by means of A�T and G�C base pairing. (B) Greater detailshowing how each of the parental strands serves as a template for the production of a comple-mentary daughter strand, which grows in length by the successive addition of single nucleotides tothe 3' end.

Page 13: Introduction to Molecular Genetics and Genomics

Proteins are also responsible for most of themetabolic activities of cells. They are essen-tial for the synthesis and breakdown of or-ganic molecules and for generating thechemical energy needed for cellular activi-ties. In 1878 the term enzyme was intro-duced to refer to the biological catalysts thataccelerate biochemical reactions in cells. By1900, thanks largely to the work of theGerman biochemist Emil Fischer, enzymeswere shown to be proteins. As often hap-pens in science, nature’s “mistakes” provideclues as to how things work. Such was thecase in establishing a relationship betweengenes and disease, because a “mistake” in agene (a mutation) can result in a “mistake”(lack of function) in the corresponding pro-tein. This provided a fruitful avenue of re-search for the study of genetics.

Inborn Errors of Metabolism as aCause of Hereditary DiseaseIt was at the turn of the twentieth centurythat the British physician Archibald Garrodrealized that certain heritable diseases fol-lowed the rules of transmission thatMendel had described for his garden peas.In 1908 Garrod gave a series of lectures inwhich he proposed a fundamental hypoth-esis about the relationship between hered-ity, enzymes, and disease:

Any hereditary disease in which cellularmetabolism is abnormal results from aninherited defect in an enzyme.

Such diseases became known as inbornerrors of metabolism, a term still in usetoday.

Garrod studied a number of inborn er-rors of metabolism in which the patientsexcreted abnormal substances in the urine.One of these was alkaptonuria. In thiscase, the abnormal substance excreted ishomogentisic acid:

An early name for homogentisic acid wasalkapton, hence the name alkaptonuria forthe disease. Even though alkaptonuria is

CCH2

O

CH

OH

HO

rare, with an incidence of about one in200,000 people, it was well known even be-fore Garrod studied it. The disease itself isrelatively mild, but it has one striking symp-tom: The urine of the patient turns blackbecause of the oxidation of homogentisicacid (Figure 1.8). This is why alkaptonuria isalso called black urine disease. An early casewas described in the year 1649:

The patient was a boy who passed black urine and who, at the age of fourteen years,was submitted to a drastic course of treatmentthat had for its aim the subduing of the fieryheat of his viscera, which was supposed to bring about the condition in question bycharring and blackening his bile. Among the measures prescribed were bleedings,purgation, baths, a cold and watery diet, anddrugs galore. None of these had any obviouseffect, and eventually the patient, who tired of the futile and superfluous therapy, resolvedto let things take their natural course. None ofthe predicted evils ensued. He married, begat a large family, and lived a long and healthylife, always passing urine black as ink.(Recounted by Garrod, 1908.)

Garrod was primarily interested in thebiochemistry of alkaptonuria, but he tooknote of family studies that indicated that thedisease was inherited as though it were dueto a defect in a single gene. As to the bio-chemistry, he deduced that the problem inalkaptonuria was the patients’ inability tobreak down the phenyl ring of six carbonsthat is present in homogentisic acid. Wheredoes this ring come from? Most animals

12 Chapter 1 Introduction to Molecular Genetics and Genomics

Figure 1.8 Urine from a person with alkapto-nuria turns black because of the oxidation of thehomogentisic acid that it contains. [Courtesy ofDaniel De Aguiar.]

Page 14: Introduction to Molecular Genetics and Genomics

obtain it from foods in their diet. Garrodproposed that homogentisic acid originatesas a breakdown product of two amino acids,phenylalanine and tyrosine, which alsocontain a phenyl ring. An amino acid isone of the “building blocks” from whichproteins are made. Phenylalanine andtyrosine are constituents of normal pro-teins. The scheme that illustrates the rela-tionship between the molecules is shown inFigure 1.9. Any such sequence of biochemicalreactions is called a biochemical pathwayor a metabolic pathway. Each arrow inthe pathway represents a single step depict-ing the transition from the “input” orsubstrate molecule, shown at the head ofthe arrow, to the “output” or productmolecule, shown at the tip. Biochemicalpathways are usually oriented either verti-cally with the arrows pointing down, as inFigure 1.9, or horizontally, with the arrowspointing from left to right. Garrod did notknow all of the details of the pathway inFigure 1.9, but he did understand that thekey step in the breakdown of homogentisicacid is the breaking open of the phenyl ringand that the phenyl ring in homogentisicacid comes from dietary phenylalanine andtyrosine.

What allows each step in a biochemicalpathway to occur? Garrod correctly sur-mised that each step requires a specific en-zyme to catalyze the reaction for thechemical transformation. Persons with aninborn error of metabolism, such as alkap-tonuria, have a defect in a single step of ametabolic pathway because they lack afunctional enzyme for that step. When anenzyme in a pathway is defective, the path-way is said to have a block at that step.One frequent result of a blocked pathway isthat the substrate of the defective enzymeaccumulates. Observing the accumulationof homogentisic acid in patients with alkap-tonuria, Garrod proposed that there mustbe an enzyme whose function is to openthe phenyl ring of homogentisic acid andthat this enzyme is missing in these pa-tients. Isolation of the enzyme that opensthe phenyl ring of homogentisic acid wasnot actually achieved until 50 years afterGarrod’s lectures. In normal people it isfound in cells of the liver, and just as Garrodhad predicted, the enzyme is defective inpatients with alkaptonuria.

1.4 Genes and Proteins 13

C

CH2C C

H

NH2CC

C

Phenylalanine (a normal amino acid)

C

C

C

CC

C

C

C

C

C

C

CH2

C

C

CC

C

C

C

H

NH2

Tyrosine (a normal amino acid)

Further breakdown

CH2C

C

C

CC

C

C

C

C

H

O

Homogentisic acid (formerly known as alkapton)

4-Maleylacetoacetic acid

C

OH

O

CH2

CH2CH2CH CCC

O

CH

4-Hydroxyphenylpyruvic acid

Benzene ring

Each arrowrepresents onestep in thebiochemicalpathway.

OH

HO

O

OH

O

OH

HO

O

OH

O

3

1

X4

OH

OH

OHO

O

This is the stepthat is blockedin alkaptonuria;homogentisicacid accumulates.

2

In the next stepthe benzene ringis opened at thisposition.

Figure 1.9 Metabolic pathway for the breakdown ofphenylalanine and tyrosine. Each step in the pathway,represented by an arrow, requires a specific enzyme tocatalyze the reaction. The key step in the breakdown ofhomogentisic acid is the breaking open of the phenyl ring.

Page 15: Introduction to Molecular Genetics and Genomics

The pathway for the breakdown ofphenylalanine and tyrosine, as it is under-stood today, is shown in Figure 1.10. In thisfigure the emphasis is on the enzymesrather than on the structures of themetabolites, or small molecules, on whichthe enzymes act. Each step in the pathwayrequires the presence of a particular en-zyme that catalyzes that step. AlthoughGarrod knew only about alkaptonuria, inwhich the defective enzyme is homogentisicacid 1,2 dioxygenase, we now know the

clinical consequences of defects in the otherenzymes. Unlike alkaptonuria, which is arelatively benign inherited disease, the oth-ers are very serious. The condition knownas phenylketonuria (PKU) results fromthe absence of (or a defect in) the enzymephenylalanine hydroxylase (PAH).When this step in the pathway is blocked,phenylalanine accumulates. The excessphenylalanine is broken down into harmfulmetabolites that cause defects in myelin for-mation that damage a child’s developing

14 Chapter 1 Introduction to Molecular Genetics and Genomics

The Black Urine Disease

Archibald E. Garrod 1908St. Bartholomew’s Hospital, London, EnglandInborn Errors of Metabolism

Although he was a distinguished physi-cian, Garrod’s lectures on the relationshipbetween heredity and congenital defects inmetabolism had no impact when theywere delivered. The important conceptthat one gene corresponds to one enzyme(the “one gene–one enzyme hypothesis”)was developed independently in the 1940sby George W. Beadle and Edward L.Tatum, who used the bread moldNeurospora crassa as their experimentalorganism. When Beadle finally becameaware of Inborn Errors of Metabolism,he was generous in praising it. This excerptshows Garrod at his best, interweavinghistory, clinical medicine, heredity, andbiochemistry in his account of alkap-tonuria. The excerpt also illustrates howthe severity of a genetic disease dependson its social context. Garrod writes asthough alkaptonuria were a harmlesscuriosity. This is indeed largely true whenthe life expectancy is short. With today’slonger life span, alkaptonuria patients ac-cumulate the dark pigment in their carti-lage and joints and eventually developsevere arthritis.

To students of heredity the inborn errorsof metabolism offer a promising field ofinvestigation. . . . It was pointed out [byothers] that the mode of incidence ofalkaptonuria finds a ready explanation ifthe anomaly be regarded as arare recessive character inthe Mendelian sense. . . . Ofthe cases of alkaptonuria avery large proportion havebeen in the children of firstcousin marriages. . . . It isalso noteworthy that, if onetakes families with five ormore children [with both par-ents normal and at least onechild affected with alkap-tonuria], the totals work outin strict conformity toMendel’s law, i.e. 57 [normalchildren] : 19 [affected chil-dren] in the proportions 3 : 1. . . . Of in-born errors of metabolism, alkaptonuriais that of which we know most. In itself itis a trifling matter, inconvenient ratherthan harmful. . . . Indications of theanomaly may be detected in early med-ical writings, such as that in 1584 of aschoolboy who, although he enjoyedgood health, continuously excretedblack urine; and that in 1609 of a monkwho exhibited a similar peculiarity andstated that he had done so all his life. . . .There are no sufficient grounds [fordoubting that the blackening substance

in the urine originally called alkapton] ishomogentisic acid, the excretion ofwhich is the essential feature of thealkaptonuric. . . . Homogentisic acid is aproduct of normal metabolism. . . . The

most likely sources of thebenzene ring in homo-gentisic acid are phenyl-alanine and tyrosine,[because when theseamino acids are adminis-tered to an alkaptonuric]they cause a very con-spicuous increase in theoutput of homogentisicacid. . . . Where the al-kaptonuric differs fromthe normal individual isin having no power ofdestroying homogentisicacid when formed—in

other words of breaking up the benzenering of that compound. . . . We may fur-ther conceive that the splitting of thebenzene ring in normal metabolism isthe work of a special enzyme and that incongenital alkaptonuria this enzyme iswanting.

Source: Originally published in London,England, by the Oxford University Press.Excerpts from the reprinted edition in HarryHarris. 1963. Garrod’s Inborn Errors ofMetabolism. London, England: OxfordUniversity Press.

We may furtherconceive that

the splitting of the benzene ring in

normalmetabolism is thework of a special

enzyme and that incongenital

alkaptonuria thisenzyme is wanting.

Page 16: Introduction to Molecular Genetics and Genomics

nervous system and lead to severe mentalretardation.

However, if PKU is diagnosed in childrensoon enough after birth, they can be placedon a specially formulated diet low in pheny-lalanine. The child is allowed only as muchphenylalanine as can be used in the synthe-sis of proteins, so excess phenylalanine doesnot accumulate. The special diet is verystrict. It excludes meat, poultry, fish, eggs,milk and milk products, legumes, nuts, andbakery goods manufactured with regularflour. These foods are replaced by an expen-sive synthetic formula. With the specialdiet, however, the detrimental effects of ex-cess phenylalanine on mental developmentcan largely be avoided, although in adultwomen with PKU who are pregnant, the fe-tus is at risk. In many countries, includingthe United States, all newborn babies havetheir blood tested for chemical signs of PKU.Routine screening is cost-effective becausePKU is relatively common. In the UnitedStates, the incidence is about 1 in 8000among Caucasian births. The disease is lesscommon in other ethnic groups.

In the metabolic pathway in Figure1.10, defects in the breakdown of tyrosineor of 4-hydroxyphenylpyruvic acid lead totypes of tyrosinemia. These are also severediseases. Type II is associated with skin le-sions and mental retardation, Type III withsevere liver dysfunction.

Mutant Genes and Defective ProteinsIt follows from Garrod’s work that a defec-tive enzyme results from a mutant gene, buthow? Garrod did not speculate. For all heknew, genes were enzymes. This would havebeen a logical hypothesis at the time. Wenow know that the relationship betweengenes and enzymes is somewhat indirect.With a few exceptions, each enzyme is en-coded in a particular sequence of nucleotidespresent in a region of DNA. The DNA regionthat codes for the enzyme, as well as adja-cent regions that regulate when and inwhich cells the enzyme is produced, makeup the “gene” that encodes the enzyme.

The genes for the enzymes in the bio-chemical pathway in Figure 1.10 have allbeen identified and the nucleotide se-quence of the DNA determined. In the fol-lowing list, and throughout this book, we

use the standard typographical conventionthat genes are written in italic type, whereasgene products are not printed in italics. Thisconvention is convenient, because it meansthat the protein product of a gene can berepresented with the same symbol as thegene itself, but whereas the gene symbol isin italics, the protein symbol is not.

• The gene PAH on the long arm of chromo-some 12 encodes phenylalanine hydroxylase(PAH).

• The gene TAT on the long arm of chromo-some 16 encodes tyrosine aminotransferase(TAT).

• The gene HPD on the long arm of chromo-some 12 encodes 4-hydroxyphenylpyruvicacid dioxygenase (HPD).

1.4 Genes and Proteins 15

Phenylalaninehydroxylase

1

Phenylalanine

Tyrosine

Tyrosineaminotransferase

2

Homogentisic acid

Homogentisic acid1,2-dioxygenase

4

4-Maleylacetoacetic acid

Further breakdown

4-Hydroxyphenyl-pyruvic acid

4-Hydroxyphenyl-pyruvic aciddioxygenase

3

A defect in thisenzyme leads toaccumulation ofphenylalanine andto phenylketonuria.

Each step in ametabolic pathwayrequires a differentenzyme.

Each enzyme isencoded in adifferent gene.

A defect in thisenzyme leads toaccumulation oftyrosine and totyrosinemia type II.

A defect in thisenzyme leads toaccumulation of4-hydroxyphenyl-pyruvic acid and totyrosinemia type III.

A defect in thisenzyme leads toaccumulation ofhomogentisic acidand to alkaptonuria.

Figure 1.10 Inborn errors of metabolism thataffect the breakdown of phenylalanine andtyrosine. An inherited disease results when anyof the enzymes is missing or defective. Alkapto-nuria results from a mutant homogentisic acid1,2 dioxygenase phenylketonuria results from amutant phenylalanine hydroxylase.

Page 17: Introduction to Molecular Genetics and Genomics

• The gene HGD on the long arm of chromo-some 3 encodes homogentisic acid 1,2dioxygenase (HGD).

Next we turn to the issue of how genes codefor enzymes and other proteins.

1.5 Gene Expression: The Central Dogma

Watson and Crick were correct in proposingthat the genetic information in DNA is con-tained in the sequence of bases in a manneranalogous to letters printed on a strip of pa-per. In a region of DNA that directs the syn-thesis of a protein, the genetic code for theprotein is contained in only one strand, andit is decoded in a linear order. A typical pro-tein is made up of one or more polypeptide

chains; each polypeptide chain consistsof a linear sequence of amino acids con-nected end to end. For example, the en-zyme PAH consists of four identicalpolypeptide chains, each 452 amino acidsin length. In the decoding of DNA, eachsuccessive “code word” in the DNA specifiesthe next amino acid to be added to thepolypeptide chain as it is being made. Theamount of DNA required to code for thepolypeptide chain of PAH is therefore452 � 3 � 1356 nucleotide pairs. The en-tire gene is very much longer—about90,000 nucleotide pairs. Only 1.5 percent ofthe gene is devoted to coding for the aminoacids. The noncoding part includes some se-quences that control the activity of thegene, but it is not known how much of thegene is involved in regulation.

There are 20 different amino acids. Onlyfour bases code for these 20 amino acids,with each “word” in the genetic code con-sisting of three adjacent bases. For example,the base sequence ATG specifies the aminoacid methionine (Met), TCC specifies serine(Ser), ACT specifies threonine (Thr), andGCG specifies alanine (Ala). There are 64possible three-base combinations but only20 amino acids because some combinationscode for the same amino acid. For example,TCT, TCC, TCA, TCG, AGT, and AGC all codefor serine (Ser), and CTT, CTC, CTA, CTG,TTA, and TTG all code for leucine (Leu). Anexample of the relationship between thebase sequence in a DNA duplex and theamino acid sequence of the correspondingprotein is shown in Figure 1.11. This particularDNA duplex is the human sequence thatcodes for the first seven amino acids in thepolypeptide chain of PAH.

The scheme outlined in Figure 1.11 in-dicates that DNA codes for protein not di-rectly but indirectly through the processesof transcription and translation. The indirectroute of information transfer,

DNA � RNA � Protein

is known as the central dogma of molecu-lar genetics. The term dogma means “set ofbeliefs”; it dates from the time the idea wasput forward first as a theory. Since then the“dogma” has been confirmed experimen-tally, but the term persists. The centraldogma is shown in Figure 1.12. The mainconcept in the central dogma is that DNA

16 Chapter 1 Introduction to Molecular Genetics and Genomics

T A C A G G T G A C G C C A G G A C C T TA T G T C C A C T G C G G T C C T G G A A

A T G T C C A C T G C G G T C C T G G A A

Nucleotide sequence

in DNA molecule

Two-step decodingprocess synthesizesa polypeptide.

Amino acid sequence

in polypeptide chain

DNA triplets encoding each amino acid

Ala Val Leu GluMet Ser Thr

An RNA intermediate

plays the role of

”messenger“

TRANSCRIPTION

TRANSLATION

Figure 1.11 DNA sequence coding for the firstseven amino acids in a polypeptide chain. TheDNA sequence specifies the amino acidsequence through a molecule of RNA that servesas an intermediary “messenger.” Although thedecoding process is indirect, the net result is thateach amino acid in the polypeptide chain isspecified by a group of three adjacent bases inthe DNA. In this example, the polypeptide chainis that of phenylalanine hydroxylase (PAH).

Page 18: Introduction to Molecular Genetics and Genomics

does not code for protein directly but ratheracts through an intermediary molecule ofribonucleic acid (RNA). The structure ofRNA is similar to, but not identical with,that of DNA. There is a difference in thesugar (RNA contains the sugar riboseinstead of deoxyribose), RNA is usuallysingle-stranded (not a duplex), and RNAcontains the base uracil (U) instead ofthymine (T), which is present in DNA.Actually, three types of RNA take part inthe synthesis of proteins:

• A molecule of messenger RNA (mRNA),which carries the genetic information fromDNA and is used as a template for polypep-tide synthesis. In most mRNA molecules,there is a high proportion of nucleotides thatactually code for amino acids. For example,the mRNA for PAH is 2400 nucleotides inlength and codes for a polypeptide of 452amino acids; in this case, more than 50percent of the length of the mRNA codes foramino acids.

• Several types of ribosomal RNA (rRNA),which are major constituents of the cellularparticles called ribosomes on whichpolypeptide synthesis takes place.

• A set of transfer RNA (tRNA) molecules,each of which carries a particular amino acidas well as a three-base recognition regionthat base-pairs with a group of three adjacentbases in the mRNA. As each tRNA partici-pates in translation, its amino acid becomesthe terminal subunit added to the length ofthe growing polypeptide chain. The tRNAthat carries methionine is denoted tRNAMet,that which carries serine is denoted tRNASer,and so forth.

The central dogma is the fundamentalprinciple of molecular genetics because itsummarizes how the genetic information inDNA becomes expressed in the amino acidsequence in a polypeptide chain:

The sequence of nucleotides in a gene specifiesthe sequence of nucleotides in a molecule ofmessenger RNA; in turn, the sequence ofnucleotides in the messenger RNA specifiesthe sequence of amino acids in the polypeptidechain.

Given a process as conceptually simpleas DNA coding for protein, what might ac-count for the additional complexity of RNAintermediaries? One possible reason is thatan RNA intermediate gives another levelfor control, for example, by degrading the

mRNA for an unneeded protein. Anotherpossible reason may be historical. RNAstructure is unique in having both an infor-mational content present in its sequence ofbases and a complex, folded three-dimen-sional structure that endows some RNAmolecules with catalytic activities. Manyscientists believe that in the earliest formsof life, RNA served both for genetic infor-mation and catalysis. As evolution pro-ceeded, the informational role wastransferred to DNA and the catalytic role toprotein. However, RNA became locked intoits central location as a go-between in theprocesses of information transfer and pro-tein synthesis. This hypothesis implies thatthe participation of RNA in protein synthe-sis is a relic of the earliest stages of evolu-tion—a “molecular fossil.” The hypothesisis supported by a variety of observations.For example, (1) DNA replication requiresan RNA molecule in order to get started(Chapter 6), (2) an RNA molecule is essen-tial in the synthesis of the tips of the chro-mosomes (Chapter 8), and (3) some RNAmolecules act to catalyze key reactions inprotein synthesis (Chapter 11).

1.5 Gene Expression: The Central Dogma 17

mRNA

(messenger)

Ribosome

Protein

DNA

TRANSCRIPTION

TRANSLATION

rRNA

(ribosomal)

tRNA

(transfer)

Figure 1.12 The “central dogma” of moleculargenetics: DNA codes for RNA, and RNA codes forprotein. The DNA � RNA step is transcription,and the RNA � protein step is translation.

Page 19: Introduction to Molecular Genetics and Genomics

TranscriptionThe manner in which genetic information istransferred from DNA to RNA is shown inFigure 1.13. The DNA opens up, and one of thestrands is used as a template for the synthe-sis of a complementary strand of RNA.(How the template strand is chosen is dis-cussed in Chapter 11.) The process of mak-ing an RNA strand from a DNA template istranscription, and the RNA molecule thatis made is the transcript. The base se-quence in the RNA is complementary (in

the Watson–Crick pairing sense) to that inthe DNA template, except that U (whichpairs with A) is present in the RNA in placeof T. The rules of base pairing between DNAand RNA are summarized in Figure 1.14. EachRNA strand has a polarity—a 5' end and a 3'end—and, as in the synthesis of DNA, nu-cleotides are added only to the 3' end of agrowing RNA strand. Hence the 5' end ofthe RNA transcript is synthesized first, andtranscription proceeds along the templateDNA strand in the 3'-to-5' direction. Eachgene includes nucleotide sequences that ini-tiate and terminate transcription. The RNAtranscript made from any gene begins at theinitiation site in the template strand, whichis located “upstream” from the aminoacid–coding region, and ends at the termi-nation site, which is located “downstream”from the amino acid–coding region. For anygene, the length of the RNA transcript isvery much smaller than the length of theDNA in the chromosome. For example, thetranscript of the PAH gene for phenyl-alanine hydroxylase is about 90,000 nu-cleotides in length, but the DNA inchromosome 12 is about 130,000,000 nu-cleotide pairs. In this case, the length of thePAH transcript is less than 0.1 percent of thelength of the DNA in the chromosome. Adifferent gene in chromosome 12 would betranscribed from a different region of theDNA molecule in chromosome 12, and per-haps from the opposite strand, but the tran-scribed region would again be small incomparison with the total length of theDNA in the chromosome.

TranslationThe synthesis of a polypeptide under the di-rection of an mRNA molecule is known astranslation. Although the sequence ofbases in the mRNA codes for the sequenceof amino acids in a polypeptide, the mole-cules that actually do the “translating” arethe tRNA molecules. The mRNA moleculeis translated in nonoverlapping groups ofthree bases called codons. For each codonin the mRNA that specifies an amino acid,there is one tRNA molecule containing acomplementary group of three adjacentbases that can pair with those in that codon.The correct amino acid is attached to theother end of the tRNA, and when the tRNAcomes into line, the amino acid to which it

18 Chapter 1 Introduction to Molecular Genetics and Genomics

A

T

T

A

A

G C

C GC G

T AC G

C G

T A

A TC G

A T

T

T A

T

U

A

A

G C

C G

C

U AC G

C G

A U

T A

T

AC

G

G

T

C

C

TA

Direction ofgrowth ofRNA strand

G

*

*

*

RNA transcript

DNA strandbeing transcribed

5’3’

3’

5’

5’

3’

U in RNA pairswith A in DNA

Figure 1.13 Transcription is the production of an RNA strand that iscomplementary in base sequence to a DNA strand. In this example, theDNA strand at the bottom is being transcribed into a strand of RNA. Notethat in an RNA molecule, the base U (uracil) plays the role of T (thymine)in that it pairs with A (adenine). Each A�U pair is marked.

Page 20: Introduction to Molecular Genetics and Genomics

is attached becomes the most recent addi-tion to the growing end of the polypeptidechain.

The role of tRNA in translation is illus-trated in Figure 1.15 and can be described asfollows:

The mRNA is read codon by codon. Eachcodon that specifies an amino acid matcheswith a complementary group of three adjacentbases in a single tRNA molecule. One end ofthe tRNA is attached to the correct amino acid,so the correct amino acid is brought into line.

The tRNA molecules used in translationdo not line up along the mRNA simultane-ously as shown in Figure 1.15. The processof translation takes place on a ribosome,

which combines with a single mRNA andmoves along it from one end to the other insteps, three nucleotides at a time (codon bycodon). As each new codon comes intoplace, the next tRNA binds with the ribo-some. Then the growing end of the poly-peptide chain becomes attached to theamino acid on the tRNA. In this way, eachtRNA in turn serves temporarily to hold thepolypeptide chain as it is being synthesized.As the polypeptide chain is transferred fromeach tRNA to the next in line, the tRNA thatpreviously held the polypeptide is releasedfrom the ribosome. The polypeptide chainelongates one amino acid at each step untilany one of three particular codons specify-ing “stop” is encountered. At this point,synthesis of the chain of amino acids is fin-ished, and the polypeptide chain is releasedfrom the ribosome. (This brief description oftranslation glosses over many of the detailsthat are presented in Chapter 11.)

The Genetic CodeFigure 1.15 indicates that the mRNAcodon AUG specifies methionine (Met) inthe polypeptide chain, UCC specifies Ser(serine), ACU specifies Thr (threonine),and so on. The complete decoding table is

1.5 Gene Expression: The Central Dogma 19

AU

Adenine

Base in DNA template

Base in RNA transcript

Uracil

TA

Thymine

Adenine

GC

Guanine

Cytosine

CG

Cytosine

Guanine

Figure 1.14 Pairing between bases in DNA and inRNA. The DNA bases A, T, G, and C pair withthe RNA bases U, A, C, and G, respectively.

AU

MessengerRNA (mRNA)

Basesin themRNA

U G UA C

A

C C A C

G GU G

U G C G

AC G C

G U C C U G G A A

Ala

C A G

Val

G A C

Leu

C U U

Glu

MetSer

Thr

Transfer RNA (tRNA)

[

The coding sequence of bases in mRNA specifies the amino acid sequence of a polypeptide chain.

Each group of three adjacent bases is a codon.

The mRNA is translated codon by codon by means of tRNA molecules.

Each tRNA has a different base sequence but about the same overall shape.

Each tRNA carries an amino acid to be added to the polypeptide chain.

Figure 1.15 The role of messenger RNA in translation is to carry the information contained in asequence of DNA bases to a ribosome, where it is translated into a polypeptide chain. Translation ismediated by transfer RNA (tRNA) molecules, each of which can base-pair with a group of threeadjacent bases in the mRNA. Each tRNA also carries an amino acid. As each tRNA, in turn, isbrought to the ribosome, the growing polypeptide chain is elongated.

Page 21: Introduction to Molecular Genetics and Genomics

called the genetic code, and it is shown inTable 1.1. For any codon, the column on theleft corresponds to the first nucleotide inthe codon (reading from the 5' end), therow across the top corresponds to the sec-ond nucleotide, and the column on theright corresponds to the third nucleotide.The complete codon is given in the body ofthe table, along with the amino acid (ortranslational “stop”) that the codon speci-fies. Each amino acid is designated by itsfull name and by a three-letter abbreviationas well as a single-letter abbreviation. Bothtypes of abbreviations are used in moleculargenetics. The code in Table 1.1 is the “stan-dard” genetic code used in translation inthe cells of nearly all organisms. In Chapter11 we examine general features of the stan-dard genetic code and the minor differencesfound in the genetic codes of certain organ-isms and cellular organelles. At this point,we are interested mainly in understandinghow the genetic code is used to translatethe codons in mRNA into the amino acidsin a polypeptide chain.

In addition to the 61 codons that codeonly for amino acids, there are four codonsthat have specialized functions:

• The codon AUG, which specifies Met(methionine), is also the “start” codon forpolypeptide synthesis. The positioning of atRNAMet bound to AUG is one of the firststeps in the initiation of polypeptide synthe-sis, so all polypeptide chains begin with Met.(Many polypeptides have the initial Metcleaved off after translation is complete.) Inmost organisms, the tRNAMet used for initia-tion of translation is the same tRNAMet usedto specify methionine at internal positions ina polypeptide chain.

• The codons UAA, UAG, and UGA are each a “stop” that specifies the termination oftranslation and results in release of thecompleted polypeptide chain from theribosome. These codons do not have tRNAmolecules that recognize them but areinstead recognized by protein factors thatterminate translation.

How the genetic code table is used to in-fer the amino acid sequence of a polypep-

20 Chapter 1 Introduction to Molecular Genetics and Genomics

UUU Phe F Phenylalanine UCU Ser S Serine UAU Tyr Y Tyrosine UGU Cys C Cysteine

UUC Phe F Phenylalanine UCC Ser S Serine UAC Tyr Y Tyrosine UGC Cys C Cysteine

UUA Leu L Leucine UCA Ser S Serine UAA Termination UGA Termination

UUG Leu L Leucine UCG Ser S Serine UAG Termination UGG Trp W Tryptophan

CUU Leu L Leucine CCU Pro P Proline CAU His H Histidine CGU Arg R Arginine

CUC Leu L Leucine CCC Pro P Proline CAC His H Histidine CGC Arg R Arginine

CUA Leu L Leucine CCA Pro P Proline CAA Gln Q Glutamine CGA Arg R Arginine

CUG Leu L Leucine CCG Pro P Proline CAG Gln Q Glutamine CGG Arg R Arginine

AUU Ile I Isoleucine ACU Thr T Threonine AAU Asn N Asparagine AGU Ser S Serine

AUC Ile I Isoleucine ACC Thr T Threonine AAC Asn N Asparagine AGC Ser S Serine

AUA Ile I Isoleucine ACA Thr T Threonine AAA Lys K Lysine AGA Arg R Arginine

AUG Met M Methionine ACG Thr T Threonine AAG Lys K Lysine AGG Arg R Arginine

GUU Val V Valine GCU Ala A Alanine GAU Asp D Aspartic acid GGU Gly G Glycine

GUC Val V Valine GCC Ala A Alanine GAC Asp D Aspartic acid GGC Gly G Glycine

GUA Val V Valine GCA Ala A Alanine GAA Glu E Glutamic acid GGA Gly G Glycine

GUG Val V Valine GCG Ala A Alanine GAG Glu E Glutamic acid GGG Gly G Glycine

U

C

A

G

U C A G

Firs

t nuc

leot

ide

in c

odon

(5’ e

nd)

Codon Three-letter and single-letter abbreviations

U

C

A

G

U

C

A

G

U

C

A

G

U

C

A

G

Third nucleotide in codon (3’ end)Table 1.1 The standard genetic code

Second nucleotide in codon

Page 22: Introduction to Molecular Genetics and Genomics

tide chain can be illustrated by using PAHagain, in particular the DNA sequence cod-ing for amino acids 1 through 7. The DNAsequence is

5'-ATGTCCACTGCGGTCCTGGAA-3'3'-TACAGGTGACGCCAGGACCTT-5'

This region is transcribed into RNA in a left-to-right direction, and because RNA growsby the addition of successive nucleotides tothe 3' end (Figure 1.13), it is the bottomstrand that is transcribed. The nucleotidesequence of the RNA is that of the topstrand of the DNA, except that U replaces T,so the mRNA for amino acids 1 through 7 is

5'-AUGUCCACUGCGGUCCUGGAA-3'

The codons are read from left to right ac-cording to the genetic code shown in Table1.1. Codon AUG codes for Met (methion-ine), UCC codes for Ser (serine), and so on.Altogether, the amino acid sequence of thisregion of the polypeptide is

5'-AUG UCC ACU GCG GUC CUG GAA-3'Met SerThr Ala ValLeu Glu

or, in terms of the single-letter abbre-viations,

5'-AUG UCC ACU GCG GUC CUG GAA-3'M S T A V L E

The full decoding operation for this regionof the PH gene is shown in Figure 1.16. In thisfigure, the initiation codon AUG is high-lighted because some patients with PKUhave a mutation in this particular codon. Asmight be expected from the fact that me-thionine is the initiation codon for polypep-tide synthesis, cells in patients with thisparticular mutation fail to produce any ofthe PAH polypeptide. Mutation and its con-sequences are considered next.

1.6 MutationThe term mutation refers to any heritablechange in a gene (or, more generally, in thegenetic material) or to the process by whichsuch a change takes place. One type of mu-tation results in a change in the sequenceof bases in DNA. The change may be sim-ple, such as the substitution of one pair ofbases in a duplex molecule for a differentpair of bases. For example, a C�G pair in a

1.6 Mutation 21

AU

UG UA C

A

C C A C

G GU G

U G C G

AC G C

G U C C U G G A A

A T G T C C A

1 2 3 4 5 6 7

1 2 3 4 5 6 7

C T G C G G T C C T G G A AT A C A G G T G A C G C C A G G A C C T T

Ala

C A G

Val

G A C

Leu

C U U

Glu

MetSer

Thr

DNA

mRNA

Polypeptide

Codon numberin PAH gene

Amino acid number in PAH polypeptide

TRANSCRIPTION

TRANSLATION

Figure 1.16 The central dogma in action. The DNA that encodes PAHserves as a template for the production of a messenger RNA, and themRNA serves to specify the sequence of amino acids in the PAHpolypeptide chain through interactions with the ribosome and tRNAmolecules.

duplex molecule may mutate to T�A, A�T,or G�C. The change in base sequence mayalso be more complex, such as the deletionor addition of base pairs. These and othertypes of mutations are considered inChapter 7. Geneticists also use the termmutant, which refers to the result of amutation. A mutation yields a mutantgene, which in turn produces a mutantmRNA, a mutant protein, and finally a mu-tant organism that exhibits the effects ofthe mutation—for example, an inborn er-ror of metabolism.

DNA from patients from all over theworld who have phenylketonuria has beenstudied to determine what types of muta-tions are responsible for the inborn error.There are a large variety of mutant types.More than 400 different mutations havebeen described in the gene for PAH. Insome cases part of the gene is missing, sothe genetic information to make a completePAH enzyme is absent. In other cases thegenetic defect is more subtle, but the resultis still either the failure to produce a PAH

Page 23: Introduction to Molecular Genetics and Genomics

changes the normal codon AUG (Met) usedfor the initiation of translation into thecodon GUG, which normally specifies va-line (Val) and cannot be used as a “start”codon. The result is that translation of thePAH mRNA cannot occur, and so no PAHpolypeptide is made. This mutant is desig-nated M1V because the codon for M (me-thionine) at amino acid position 1 in thePAH polypeptide has been changed to acodon for V (valine). Although the M1Vmutant is quite rare worldwide, it is com-mon in some localities, such as QuébecProvince in Canada.

One PAH mutant that is quite commonis designated R408W, which means thatcodon 408 in the PAH polypeptide chainhas been changed from one coding for argi-nine (R) to one coding for tryptophan (W).This mutation is one of the four most com-mon among European Caucasians withPKU. The molecular basis of the mutant isshown in Figure 1.18. In this case, the firstbase pair in codon 408 is changed from aC�G base pair into a T�A base pair. The re-sult is that the PAH mRNA has a mutantcodon at position 408; specifically, it hasUGG instead of CGG. Translation does oc-cur in this mutant because everything elseabout the mRNA is normal, but the result isthat the mutant PAH carries a tryptophan(Trp) instead of an arginine (Arg) at posi-

22 Chapter 1 Introduction to Molecular Genetics and Genomics

GUG U C C A C U G C G G U C C U G G A A

T G T C C A

1 2 3 4 5 6 7

C T G C G G T C C T G G A AA C A G G T G A C G C C A G G A C C T TC

G

C A C

Val

DNA

mRNA

tRNAVal

Codon 1 inPAH gene

Mutation of

TRANSLATION

XX

AT

GC

Mutant initiation codonin PAH mRNA

No PAH polypeptide is produced because tRNAVal cannot be used to initiate polypeptide synthesis.

TRANSCRIPTION

Figure 1.17 The M1V mutant in the PAH gene.The methionine codon needed for initiationmutates into a codon for valine. Translationcannot be initiated, and no PAH polypeptide isproduced.

The women in the wedding photograph are sisters. Both are homozygous for the same mutant PAHgene. The bride is the younger of the two. She was diagnosed just three days after birth and put onthe PKU diet soon after. Her older sister, the maid of honor, was diagnosed too late to begin the dietand is mentally retarded. The two-year old pictured in the photo at the right is the daughter of themarried couple. They planned the pregnancy: dietary control was strict from conception to deliveryto avoid the hazards of excess phenylalanine harming the fetus. Their daughter has passed alldevelopmental milestones with distinction. [Courtesy of Charles R. Scriver.]

protein or the production of a PAH proteinthat is inactive. In the mutation shown inFigure 1.17, substitution of a G�C base pairfor the normal A�T base pair at the veryfirst position in the coding sequence

Page 24: Introduction to Molecular Genetics and Genomics

tion 408 in the polypeptide chain. The con-sequence of the seemingly minor change ofone amino acid is very drastic. Althoughthe R408W polypeptide is complete, theenzyme has less than 3 percent of the activ-ity of the normal enzyme.

Protein Folding and StabilityMore than 400 different mutations in thePAH gene have been identified in patientswith PKU throughout the world. Many ofthe mutations affect the level of expressionof the gene or processing of the RNA tran-script, and some mutations are deletions inwhich part of the gene is missing. But morethan 240 of the mutations are simple aminoacid replacements resulting from single nu-cleotide substitutions in the DNA. Surpris-ingly, only a minority of amino acidreplacements result in a normal amount ofPAH protein with a reduced enzyme activity.As a result of most mutations the amount ofPAH is reduced, sometimes drastically, andin some other mutations the enzyme activ-ity of the PAH protein that remains is virtu-ally normal. Yet in all these cases the level ofexpression of the gene, and the amount ofmRNA, are within the normal range.

The reason why so many amino acidreplacements reduce the amount of proteinis that they cause problems in protein fold-

ing, or in the coming together of the pro-tein subunits, or in the stability of thefolded protein. Protein folding is the com-plex process by which polypeptide chainsattain a stable three-dimensional structurethrough short-range chemical interactionsbetween nearby amino acids and long-range interactions between amino acids indifferent parts of the molecule. Folding nor-mally occurs as the polypeptide is beingsynthesized on the ribosome, and theprocess is facilitated by a class of proteinsknown as chaperones. During the foldingprocess, the polypeptide chain twists andbends until it achieves a minimum energystate that maximizes the stability of the re-sulting structure, which is referred to as thenative conformation. For example, oneaspect of protein folding is that hydropho-bic amino acids, which have low affinity forwater molecules, tend to move toward eachother and to form a relatively hydrophobiccenter, or core, in the native conformation.For a polypeptide of realistic length, thereare so many short-range and long-range in-teractions, and so many possible foldedconformations, that even the fastest com-puters cannot calculate and compare alltheir energy levels. Computer simulation ofprotein folding has yielded some insights,but the reliable prediction of protein foldingis still a major challenge.

1.6 Mutation 23

GC

C C AG G

U

C A A U

G UU A

A C C U

UG G A

UGG C C C U U C U C A G U U C G C

G C C A C A A

4 0 8

T A C T GG C C C T T C TC G G T G T T A T G

CG A C C G G G A A G A

C A G T T C G CG T C A A G C G

Pro

A C C

Trp

G G G

Pro

A A G

Phe

A G U

Ser

C A A

Val

G C G

Arg

AlaThr

Ile

DNA

mRNA

Codon 408 inPAH gene

Polypeptide

Mutation of CGTA

TA

Mutant amino acid in PAH polypeptide

Mutant codonin PAH mRNA

TRANSCRIPTION

TRANSLATION

Figure 1.18 The R408W mutant in the PAH gene. Codon 408 for arginine(R) is mutated into a codon for trypto-phan (W). The result is that position408 in the mutant PAH polypeptide isoccupied by tryptophan rather than byarginine. The mutant protein has noPAH enzyme activity.

Page 25: Introduction to Molecular Genetics and Genomics

Hypothetical pathways of protein fold-ing in the case of PAH are illustrated inFigure 1.19. The normal pathway is shown inpart A, including some of the (typicallyshort-lived) intermediates in the foldingprocess. The native conformation of a singlePAH polypeptide constitutes the PAHmonomer. Like many other polypeptides,the PAH monomer contains short regions,called oligomerization domains (oligomeans “a small number”), through whichPAH polypeptides undergo stable binding toone another. In the case of PAH, the activeform of the PAH enzyme is a tetramer,consisting of four identical polypeptidechains held together by interactions be-tween the tetramerization domains. Notethat the folding and tetramerization proc-

esses are reversible, so any amino acid re-placement that decreases the stability of thetetramer or any of the intermediates willcause more of the PAH polypeptides to foldaccording to pathway B.

Pathway B in the figure is a misfoldingpathway in which the folded monomers areprone to undergo irreversible aggregationwith each other. These aggregates aretargeted for enzymatic breakdown intotheir constituent amino acids, first by be-coming covalently bound with a 76–aminoacid polypeptide called ubiquitin, which isattached through the activity of severalproteins, including a ubiquitin-conjugatingenzyme. The tagged protein is then degradedby the proteasome, which is a large multi-protein complex containing proteins with

24 Chapter 1 Introduction to Molecular Genetics and Genomics

Proteasome

(A)–Normal folding pathway (B)–Aberrant folding pathway

(subunits prone to aggregation)

Binding of oligomerization domains

Tetramer

(active form in cells)

Aggregate

Abnormal intermediateFolding intermediate

Folding intermediate

UnfoldedPAH polypeptide

Oligomerization domains through which monomers interact

Active site for substrate cleavage

Degradation via ubiquitin-dependent proteasomal pathway and other routes

Folded monomer

Aggregate

Figure 1.19 Someamino acid replace-ments perturb theability of a proteinto fold properly.(A) Normal foldingin phenylalaninehydroxylase formsthe active tetramer.(B) Abnormal foldingof a mutant poly-peptide chain resultsin the formation ofpolypeptide aggre-gates, which areprogressively cleavedinto the constituentamino acids through aubiquitin-dependentproteosomal degrada-tion pathway.

Page 26: Introduction to Molecular Genetics and Genomics

ubiquitin-binding, protease, and other ac-tivities. The ubiquitin/proteasome pathwayis required for the degradation of specificproteins during the cell cycle and develop-ment, and it is also used to degrade certainproteins that are intrinsically unstable orthat become unfolded in response to stress.In the case of PAH, many amino acid re-placements decrease the stability of the pro-tein to such an extent that the protein isrouted through the misfolding pathway inFigure 1.19B and targeted for degradation.A destabilizing amino acid replacement canbe located at virtually any position alongthe polypeptide chain.

1.7 Genes and EnvironmentInborn errors of metabolism illustrate thegeneral principle that genes code for pro-teins and that mutant genes code for mu-tant proteins. In cases such as PKU, mutantproteins cause such a drastic change inmetabolism that a severe genetic defectresults. But biology is not necessarily des-tiny. Organisms are also affected by the en-vironment. PKU serves as an example ofthis principle, because patients who adhereto a diet restricted in the amount of phenyl-alanine develop mental capacities withinthe normal range. What is true in this ex-

ample is true in general. Most traits are de-termined by the interaction of genes andenvironment.

It is also true that most traits are affectedby multiple genes. No one knows howmany genes are involved in the develop-ment and maturation of the brain and ner-vous system, but the number must be in thethousands. This number is in addition to thegenes that are required in all cells to carryout metabolism and other basic life func-tions. It is easy to lose sight of the multiplic-ity of genes when considering extremeexamples, such as PKU, in which a singlemutation can have such a drastic effect onmental development. The situation is thesame as that with any complex machine. Anairplane can function if thousands of partsare working together in harmony, but onlyone defective part, if that part affects a vitalsystem, can bring it down. Likewise, the de-velopment and functioning of every trait re-quire a large number of genes working inharmony, but in some cases a single mutantgene can have catastrophic consequences.

In other words, the relationship be-tween a gene and a trait is not necessarily asimple one. The biochemistry of organismsis a complex branching network in whichdifferent enzymes may share substrates,yield the same products, or be responsive tothe same regulatory elements. The result is

1.7 Genes and Environment 25

Three-dimensional structure of two of the four subunits in the active tetramer of phenylalaninehydroxylase. The chains of amino acids are represented as a sequence of curls, loops, and flatarrows, which represent different types of local structure described in Chapter 11. The oligomeriza-tion (in this case, tetramerization) domain is shown in green, and the catalytic domain containingthe active site is shown in gold. Compare this with the simplified diagram in Figure 1.19. [Courtesyof R. C. Stevens, T. Flatmark, and H. Erlandsen. From H. Erlandsen, F. Fusetti, A Martinez, E.Hough, T. Flatmark, and R. C. Stevens. 1997. Nature Structural Biology 4: 995.]

Au: Please con-firm that caption isOK as set.

Page 27: Introduction to Molecular Genetics and Genomics

that most visible traits of organisms are thenet result of many genes acting together andin combination with environmental factors.PKU affords examples of each of three prin-ciples governing these interactions:

1. One gene can affect more than one trait.Children with extreme forms of PKU oftenhave blond hair and reduced body pigment.This is because the absence of PAH is ametabolic block that prevents conversion ofphenylalanine into tyrosine, which is theprecursor of the pigment melanin. The rela-tionship between severe mental retardationand decreased pigmentation in PKU makessense only if one knows the metabolic con-nections among phenylalanine, tyrosine,and melanin. If these connections were notknown, the traits would seem completelyunrelated. PKU is not unusual in this re-gard. Many mutant genes affect multipletraits through their secondary or indirecteffects. The various, sometimes seeminglyunrelated, effects of a mutant gene arecalled pleiotropic effects, and the phe-nomenon itself is known as pleiotropy.Figure 1.20 shows a cat with white fur andblue eyes, a pattern of pigmentation that isoften (about 40 percent of the time) associ-ated with deafness. Hence deafness can beregarded as a pleiotropic effect of white coatand blue eye color. The developmental ba-sis of this pleiotropy is unknown.

2. Any trait can be affected by more thanone gene. We discussed this principle earlierin connection with the large number ofgenes that are required for the normal de-velopment and functioning of the brain andnervous system. Among these are genesthat affect the function of the blood–brainbarrier, which consists of specialized glialcells wrapped around tight capillary walls inthe brain, forming an impediment to thepassage of most water-soluble moleculesfrom the blood to the brain. The blood–brainbarrier therefore affects the extent to whichexcess free phenylalanine in the blood canenter the brain itself. Because the effective-ness of the blood–brain barrier differsamong individuals, PKU patients with verysimilar levels of blood phenylalanine canhave dramatically different levels of cogni-tive development. This also explains in partwhy adherence to a controlled-phenylala-nine diet is critically important in childrenbut less so in adults; the blood–brain barrieris less well developed in children and istherefore less effective in blocking the ex-cess phenylalanine.

Multiple genes affect even simpler meta-bolic traits. Phenylalanine breakdown andexcretion serve as a convenient example.The metabolic pathway is illustrated inFigure 1.10. Four enzymes in the pathwayare indicated, but even more enzymes areinvolved at the stage labeled “further break-down.” Because differences in the activityof any of these enzymes can affect the rateat which phenylalanine can be brokendown and excreted, all of the enzymes inthe pathway are important in determiningthe amount of excess phenylalanine in theblood of patients with PKU.

3. Most traits are affected by environmen-tal factors as well as by genes. Here wecome back to the low-phenylalanine diet.Children with PKU are not doomed to se-vere mental deficiency. Their capabilitiescan be brought into the normal range by di-etary treatment. PKU serves as an exampleof what motivates geneticists to try to dis-cover the molecular basis of inherited dis-ease. The hope is that knowing themetabolic basis of the disease will eventu-ally make it possible to develop methods forclinical intervention through diet, medica-tion, or other treatments that will amelio-rate the severity of the disease.

26 Chapter 1 Introduction to Molecular Genetics and Genomics

Figure 1.20 Among cats with white fur and blue eyes,about 40 percent are born deaf. We do not know whythere is defective hearing nor why it is so often associatedwith coat and eye color. This form of deafness can beregarded as a pleiotropic effect of white fur and blue eyes.

Page 28: Introduction to Molecular Genetics and Genomics

1.8 Evolution: From Genes to Genomes, from Proteins to Proteomes

The pathway for the breakdown and excre-tion of phenylalanine is by no meansunique to human beings. One of the re-markable generalizations to have emergedfrom molecular genetics is that organismsthat are very distinct—for example, plantsand animals—share many features in theirgenetics and biochemistry. These similari-ties indicate a fundamental “unity of life”:

All creatures on Earth share many features ofthe genetic apparatus, including geneticinformation encoded in the sequence of basesin DNA, transcription into RNA, and transla-tion into protein on ribosomes with the use oftransfer RNAs. All creatures also share certaincharacteristics in their biochemistry, includingmany enzymes and other proteins that aresimilar in amino acid sequence, three-dimensional structure, and function.

The totality of DNA in a single cell is calledthe genome of the organism. In sexual or-ganisms, the genome is usually regarded asthe DNA present in a reproductive cell. Thehuman genome, which is contained in thechromosomes of a sperm or egg, includesapproximately 3 billion nucleotide pairs ofDNA. The complete set of proteins encodedin the genome is known as the proteome.The study of genomes constitutes genomics;the study of proteomes constitutes pro-teomics. The fundamental unity of life can beseen in the similarity of proteins in the pro-teomes of diverse types of organisms. Forexample, the proteome of the fruit flyDrosophila melanogaster includes 13,601 pro-teins; these can be grouped into 8065 dif-ferent families of proteins that are similar inamino acid sequence. For comparison, theproteome of the nematode wormCaenorhabditis elegans contains 18,424 pro-teins that can be grouped into 9453 fami-lies. These two proteomes share about 5000proteins that are sufficiently similar to beregarded as having a common function.Among the protein families that are com-mon to flies and worms, about 3000 arealso found in the proteome of the yeastSaccharomyces cerevisiae. Based on these datafrom these three completely sequencedcomplex genomes, it seems likely that allorganisms whose cells have a nucleus and

chromosomes will turn out to share severalthousand protein families. Furthermore,about a thousand of these protein familiesare shared with organisms as distantly re-lated as bacteria.

The Molecular Unity of LifeWhy do organisms share a common set ofsimilar genes and proteins? Because allcreatures share a common origin. Theprocess of evolution takes place when apopulation of organisms descended from acommon ancestor gradually changes in ge-netic composition through time. From anevolutionary perspective, the unity of fun-damental molecular processes is derived byinheritance from a distant common ances-tor in which the molecular mechanismswere already in place.

Not only the unity of life but also manyother features of living organisms becomecomprehensible from an evolutionary per-spective. For example, the interposition ofan RNA intermediate in the basic flow of

1.8 Evolution from Genes to Genomes, from Proteins to Proteomes 27

Photocaptionphotocaptionphotocaption-photocaptionphoto-captionphotocaptionphotocaptionphotocap-tionphotocaptionpho-tocaptionphotocaptionphotocaptionphoto-captionphotocaption-photocaptionphotocaptionphotocaptionpho-tocaptionphotocap-tionphotocaption

09131_01_1677P

09131_01_1678P

Page 29: Introduction to Molecular Genetics and Genomics

28 Chapter 1 Introduction to Molecular Genetics and Genomics

genetic information from DNA to RNA toprotein makes sense if the earliest forms oflife used RNA for both genetic informationand enzyme catalysis. The importance ofthe evolutionary perspective in undestand-ing aspects of biology that seem pointless orneedlessly complex is summed up in afamous aphorism of the evolutionary biolo-gist Theodosius Dobzhansky: “Nothing inbiology makes sense except in the light ofevolution.”

One indication of the common ancestryamong Earth’s creatures is illustrated inFigure 1.21. The tree of relationships was in-ferred from similarities in nucleotide se-quence in an RNA molecule found in thesmall subunit of the ribosome. Three majorkingdoms of organisms are distinguished:

1. The kingdom Bacteria This group includesmost bacteria and cyanobacteria (formerlycalled blue-green algae). Cells of theseorganisms lack a membrane-boundednucleus and mitochondria, are surroundedby a cell wall, and divide by binary fission.

2. The kingdom Archaea This group wasinitially discovered among microorganismsthat produce methane gas or that live inextreme environments, such as hot springsor high salt concentrations. They are widelydistributed in more normal environments aswell. Like those of bacteria, the cells ofarchaeans lack internal membranes. DNAsequence analysis indicates that the machin-ery for DNA replication and transcription inarchaeans resembles that of eukaryans,whereas their metabolism strongly

resembles that of bacteria. About half of thegenes found in the kingdom Archaea areunique to this group.

3. The kingdom Eukarya This group includesall organisms whose cells contain anelaborate network of internal membranes, amembrane-bounded nucleus, andmitochondria. Their DNA is organized intotrue chromosomes, and cell division takesplace by means of mitosis (discussed inChapter 4). The eukaryotes include plantsand animals as well as fungi and manysingle-celled organisms, such as amoebaeand ciliated protozoa.

The members of the kingdoms Bacteriaand Archaea are often grouped together intoa larger assemblage called prokaryotes,which literally means “before [the evolutionof] the nucleus.” This terminology is conve-nient for designating prokaryotes as a groupin contrast with eukaryotes, which literallymeans “good [well-formed] nucleus.”

Natural Selection and DiversityAlthough Figure 1.21 illustrates the unityof life, it also illustrates life’s diversity. Frogsare different from fungi, and beetles are dif-ferent from bacteria. As a human being, it issobering to consider that complex, multi-cellular organisms are relatively recent ar-

Figure 1.21 Evolutionary relation-ships among the major life formsas inferred from similarities innucleotide sequence in anRNA molecule found inthe small subunit of theribosome. The threemajor kingdoms—Bacteria, Archaea, andEukarya—are apparent.Plants, animals, andfungi are more closelyrelated to each otherthan to members of eitherof the other kingdoms.Among eukaryotes, the time of branching of the diverse groupsof undifferentiated, relatively simpleorganisms (trichomonads, microsporidians,euglenoids, and so forth) is probably not asancient as suggested by the diagram. [Courtesyof Mitchell L. Sogin.]

BACTERIA

ARCHAEA

Plant ChloroplastsCyanobacteria

Plant Mitochondria

Enterobacteria

HalobacteriaMethanobacteria

Mycoplasma

Agrobacterium

Sulfolobus

Thermoplasma

Stramenopiles

EUKARYA

Plantae

Alveolates

Animalia

Red algae

Fungi

Slime molds

EntamoebaeHeterolobosea

Physarum

Kinetoplastids

Euglenoids

Microsporidians

TrichomonadsDiplomonads

Page 30: Introduction to Molecular Genetics and Genomics

Chapter Summary 29

rivals to the evolutionary scene of life onEarth. Animals came later still and primatesvery late indeed. What about human evo-lution? In the time scale of Earth history,human evolution is a matter of a few mil-lion years—barely a snap of the fingers.

If common ancestry is the source of theunity of life, what is the source of its diver-sity? Because differences among species areinherited, the original source of the differ-ences must be mutation. However, muta-tions alone are not sufficient to explainwhy organisms are adapted to living intheir environments—why ocean mammalshave special adaptations for swimming anddiving, why desert mammals have specialadaptations that enable them to survive onminimal amounts of water. Mutations arechance events not directed toward any par-ticular adaptive goal, such as longer furamong mammals living in the Arctic. The

process that accounts for adaptation wasdescribed by Charles Darwin in his 1859book On the Origin of Species. Darwin pro-posed that adaptation is the result of nat-ural selection: Individual organismscarrying particular mutations or combina-tions of mutations that enable them to sur-vive or reproduce more effectively in theprevailing environment leave more off-spring than other organisms and socontribute their favorable genes dispro-portionately to future generations. If thisprocess is repeated throughout the courseof many generations, the entire species be-comes genetically transformed—that is, itevolves—because a gradually increasingproportion of the population inherits thefavorable mutations. Mutation, natural se-lection, and other features of the field ofpopulation genetics (also called evolutionary ge-netics) are discussed in Chapter 17.

Chapter SummaryOrganisms of the same species have some traits (charac-teristics) in common but may differ from each other in in-numerable other traits. Many of the differences betweenorganisms result from genetic differences, the effects ofthe environment, or both. Genetics is the study of inher-ited traits, including those influenced in part by the envi-ronment. The elements of heredity consist of genes, whichare transmitted from parents to offspring in reproduction.Although the sorting of genes in successive generationswas first expressed numerically by Mendel, the chemicalbasis of genes was discovered by Miescher in the form of aweak acid, deoxyribonucleic acid (DNA). However, exper-imental proof that DNA is the genetic material did notcome until about the middle of the twentieth century.

The first convincing evidence of the role of DNA inheredity came from the experiments of Avery, MacLeod,and McCarty, who showed that genetic characteristics inbacteria could be altered from one type to another bytreatment with purified DNA. In studies of Streptococcuspneumoniae, they transformed mutant cells unable tocause pneumonia into cells that could do so by treatingthem with pure DNA from disease-causing forms. A sec-ond important line of evidence was the Hershey–Chaseexperiment. Hershey and Chase showed that the T2 bac-terial virus injects primarily DNA into the host bacterium(Escherichia coli) and that a much higher proportion ofparental DNA, compared with parental protein, is foundamong the progeny phage.

The three-dimensional structure of DNA, proposed in1953 by Watson and Crick, gave many clues to the man-ner in which DNA functions as the genetic material. Amolecule of DNA consists of two long chains of nu-

cleotide subunits twisted around each other to form aright-handed helix. Each nucleotide subunit contains anyone of four bases: A (adenine), T (thymine), G (guanine),or C (cytosine). The bases are paired in the two strands ofa DNA molecule; wherever one strand has an A, the part-ner strand has a T, and wherever one strand has a G, thepartner strand has a C. The base pairing means that thetwo paired strands in a DNA duplex molecule have com-plementary base sequences along their lengths. Thestructure of the DNA molecule suggested that genetic in-formation could be coded in DNA in the sequence ofbases. Mutations—changes in the genetic material—re-sult from changes in the sequence of bases, such as thesubstitution of one nucleotide for another or the insertionor deletion of one or more nucleotides. The structure ofDNA also suggested a mode of replication: The twostrands of the parental DNA molecule separate, and eachindividual strand serves as a template for the synthesis ofa new, complementary strand.

Most genes code for proteins. More precisely stated,most genes specify the sequence of amino acids in apolypeptide chain. The transfer of genetic informationfrom DNA into protein is a multistep process that includesseveral types of RNA (ribonucleic acid). Structurally, anRNA strand is similar to a DNA strand except that the“backbone” contains a different sugar (ribose instead ofdeoxyribose) and RNA contains the base uracil (U) in-stead of thymine (T). Also, RNA is usually present in cellsin the form of single, unpaired strands. The initial step ingene expression is transcription, in which a molecule ofRNA is synthesized that is complementary in basesequence to whichever DNA strand is being transcribed.

Page 31: Introduction to Molecular Genetics and Genomics

30 Chapter 1 Introduction to Molecular Genetics and Genomics

In polypeptide synthesis, which takes place on a ribo-some, the base sequence in the RNA transcript is trans-lated in groups of three adjacent bases (codons). Thecodons are recognized by different types of transfer RNA(tRNA) through base pairing. Each type of tRNA is at-tached to a particular amino acid, and when a tRNA base-pairs with the proper codon on the ribosome, the growingend of the polypeptide chain is transferred to the aminoacid on the tRNA. The table of all codons and the aminoacids they specify is called the genetic code. Special codonsspecify the “start” (AUG, Met) and “stop” (UAA, UAG,and UGA) of polypeptide synthesis. The reason why vari-ous types of RNA are an intimate part of transcription andtranslation is probably that the earliest forms of life usedRNA for both genetic information and enzyme catalysis.

A mutation that alters one or more codons in a genecan change the amino acid sequence of the resultingpolypeptide chain synthesized in the cell. Often the al-tered protein is functionally defective, so an inborn errorof metabolism results. One of the first inborn errors ofmetabolism studied was alkaptonuria; it results from theabsence of an enzyme for cleaving homogentisic acid,which accumulates and is excreted in the urine, turningblack upon oxidation. Phenylketonuria (PKU) is an in-born error of metabolism that affects the same metabolicpathway. The enzyme defect in PKU results in an inabilityto convert phenylalanine to tyrosine. Phenylalanine ac-cumulation has catastrophic effects on the developmentof the brain. Children with the disease have severe men-tal deficits unless they are treated with a special diet lowin phenylalanine.

Most visible traits of organisms result from manygenes acting together in combination with environmentalfactors. The relationship between genes and traits is oftencomplex because (1) every gene potentially affects manytraits (pleiotropy), (2) every trait is potentially affected bymany genes, and (3) many traits are significantly affectedby environmental factors as well as by genes.

All living creatures are united by sharing many fea-tures of the genetic apparatus (for example, transcriptionand translation) and many aspects of metabolism. Theyshare many similar genes in the genome and many simi-lar proteins in the proteome. The unity of life results fromall life being of common ancestry and provides evidencefor evolution. There is also great diversity among livingcreatures. The three major kingdoms of organisms are thekingdoms Bacteria (whose members lack a membrane-bounded nucleus), Archaea (whose members share fea-tures with both bacteria and eukaryans but form adistinct group), and Eukarya (which includes all “higher”organisms whose cells have a membrane-bounded nu-cleus that contains DNA organized into discrete chromo-somes). Members of the kingdoms Bacteria and Archaeaare often collectively called prokaryotes. The ultimatesource of diversity among organisms is mutation, but nat-ural selection is the process by which mutations that en-hance survival and reproduction are retained andmutations that are harmful are eliminated. Natural selec-tion, first proposed by Darwin, is therefore the primarymechanism by which organisms become progressivelybetter adapted to their environments.

Key Termsadenine (A)alkaptonuriaamino acidArchaeaBacteriabacteriophagebase (in DNA or RNA)biochemical pathwayblock (in a biochemical pathway)central dogmachaperonechromosomecodoncolonycomplementary base sequencecytosine (C)deoxyribosedeoxyribonucleic acid (DNA)double-stranded DNAduplex DNAenzymeEukaryaeukaryote

evolutiongenegenetic codegeneticsgenomeguanine (G)homogentisic acidinborn error of metabolismmessenger RNA (mRNA)metabolic pathwaymetabolitemonomermutantmutationnative conformationnatural selectionnucleotideoligomerization domainphagephenylalanine hydroxylase (PAH)phenylketonuria (PKU)pleiotropic effectpleiotropy

polarity (of DNA or RNA)polypeptide chainproduct moleculeprokaryoteprotein foldingproteomereplicationribonucleic acid (RNA)riboseribosomal RNA (rRNA)ribosomesingle-stranded DNAsubstrate moleculetemplatetetramerthymine (T)transcripttranscriptiontransfer RNA (tRNA)transformationtranslationuracil (U)Watson–Crick base pairing

Page 32: Introduction to Molecular Genetics and Genomics

Guide to Problem Solving 31

Review the Basics• What were the key experiments showing that DNA is

the genetic material?

• How did understanding the molecular structure ofDNA give clues to its ability to replicate, to code forproteins, and to undergo mutation?

• Why is pairing of complementary bases a key featureof DNA replication?

• What is the process of transcription and in what waysdoes it differ from DNA replication?

• What three types of RNA participate in protein syn-thesis, and what is the role of each type of RNA?

• What is the “genetic code,” and how is it relevant tothe translation of a polypeptide chain from a mole-cule of messenger RNA?

• What is an inborn error of metabolism? How did thisconcept serve as a bridge between genetics and bio-chemistry?

Guide to Problem SolvingProblem 1 In the human gene for the � chain of hemoglo-bin (the oxygen-carrying protein in the red blood cells),the first 30 nucleotides in the amino-acid–coding regionhave the sequence

3'-TACCACGTGGACTGAGGACTCCTCTTCAGA-5'

What is the sequence of the partner strand?

Answer The base pairing between the strands is A with Tand G with C, but it is equally important that the strandsin a DNA duplex have opposite polarity. The partnerstrand is therefore oriented with its 5' end at the left, andthe base sequence is

5'-ATGGTGCACCTGACTCCTGAGGAGAAGTCT-3'

Problem 2 If the DNA duplex for the � chain of hemoglo-bin in Problem 1 were transcribed from left to right, de-duce the base sequence of the RNA in this coding region.

Answer To deduce the RNA sequence, we must applythree concepts. First, in the transcription of RNA, the basepairing is such that an A, T, G, or C in the DNA templatestrand is transcribed as U, A, C, or G, respectively, in theRNA strand. Second, the RNA transcript and the DNAtemplate strand have opposite polarity. Third (and criti-cally for this problem), the RNA transcript is always tran-scribed in the 5'-to-3' direction, so the 5' end of the RNAis the end synthesized first. This being the case, and con-sidering the opposite polarity, the 3' end of the templatestrand must be transcribed first. Because we are told thattranscription takes place from left to right, we can deduce

• How does the “central dogma” explain Garrod’s dis-covery that nonfunctional enzymes result from mu-tant genes?

• Explain why many mutant forms of phenylalaninehydroxylase have a simple amino acid replacement,yet the mutant polypeptide chains are absent or pre-sent in very small amounts.?

• What is a pleiotropic effect of a gene mutation? Givean example.

• What are some of the major differences in cellular or-ganization among Bacteria, Archaea, and Eukarya?

• What process was Charles Darwin describing whenhe wrote the following statement? “As more individ-uals are produced than can possibly survive, theremust in every case be a struggle for existence, eitherone individual with another of the same species, orwith the individuals of distinct species, or with thephysical conditions of life.”

that the transcribed strand is that given in Problem 1. TheRNA transcript therefore has the base sequence

5'-AUGGUGCACCUGACUCCUGAGGAGAAGUCU-3'

Problem 3 Given the RNA sequence coding for part of hu-man � hemoglobin deduced in Problem 2, what is theamino acid sequence in this part of the � polypeptidechain?

Answer The polypeptide chain is translated in successivegroups of three nucleotides (each group constituting acodon), starting at the 5' end of the coding sequence andmoving in the 5'-to-3' direction. The amino acid corre-sponding to each codon can be found in the genetic codetable. The first ten amino acids in the polypeptide chainare therefore

5'-AUG GUG CAC CUG ACU CCU GAG GAG AAG UCU-3'Met Val His Leu Thr Pro Glu Glu Lys Ser

Problem 4 A very important mutation in human hemoglo-bin occurs in the DNA sequence shown in Problem 1. Inthis mutation, the T at nucleotide position 20 is replacedwith an A, where the numbering is from the left in thestrand shown at the top in Problem 1. The mutant hemo-globin is called sickle-cell hemoglobin, and it is associatedwith a severe anemia known as sickle-cell anemia. Severe asthe genetic disease is, the mutant gene is present at rela-tively high frequency in some human populations be-cause carriers of the gene, who have only a mild anemia,are more resistant to falciparum malaria than are noncar-riers. What is the nucleotide sequence of this region of the

Page 33: Introduction to Molecular Genetics and Genomics

32 Chapter 1 Introduction to Molecular Genetics and Genomics

C H A P T E R X Xsee my father. . . . She had asked many doctors for help, whichnone had been able to give. But this woman was unusuallypersistent and would not accept the situation without expla-nation. She had also noticed that a peculiar smell alwaysclung to her children. . . . This woman was advised to seekhelp from my father [who held a professorship of nutrient re-search at the University Hospital in Norway]. He of coursehad no real hope of being able to help her. But he did notwant to reject her, and he agreed to examine the children. Onclinical examination he found nothing of importance, exceptthe [severe mental retardation], which was beyond doubt.[Urine analysis] was part of his thorough routine examina-tion. On adding ferric chloride [to normal urine] the color nor-mally stays brownish, [but in the case of these children] adeep green color developed. He had not seen this reactionbefore . . . He concluded that two mentally retarded childrenexcreted a substance not found in normal urine. But whichsubstance?” Consult this keyword site for more informationon how Asbjörn Fölling finally tracked down the cause of thedisease.

• With proper dietary control of blood phenylalanine, pa-tients with PKU can develop normally and lead normal lives.

GeNETics on the Web will introduce you to some of the mostimportant sites for finding genetic information on theInternet. To explore these sites, visit the Jones and Bartletthome page at

http://www.jbpub.com/genetics

For the book Genetics: Analysis of Genes and Genomes, choosethe link that says Enter GeNETics on the Web. You will be pre-sented with a chapter-by -chapter list of highlighted keywords.Select any highlighted keyword and you will be linked to a Website containing genetic information related to the keyword.

• James D. Watson once said that he and Francis Crick hadno doubt that their proposed DNA structure was essentiallycorrect, because the structure was so beautiful it had to betrue! At an internet site accessed by the keyword DNA, youcan view a large collection of different types of models of DNAstructure. Some models highlight the sugar–phosphate back-bones, others the A�T and G�C base pairs, still others thehelical structure of double-stranded DNA.

• How was PKU discovered? The story is told by theNorwegian physician Ivar Fölling: “The stage is set in 1934. Amother with two severely mentally retarded children came to

DNA duplex in sickle-cell hemoglobin (both strands) andthat of the messenger RNA, and what is the amino acid re-placement that results in sickle-cell hemoglobin?

Answer The mutation is already given as a T � A substitu-tion at position 20. The sequence of the DNA duplex is ob-tained as in Problem 1, that of the RNA as in Problem 2,

and that of the mutant polypeptide chain as in Problem 3,except that at each step there is one nucleotide (or oneamino acid) that differs from the nonmutant. The DNA,RNA, and polypeptide in this region of sickle-cell hemo-globin are as follows, where the differences from the non-mutant gene are in red. The amino acid replacement isglutamic acid � valine.

DNA (transcribed strand) 3'-TAC CAC GTG GAC TGA GGA CAC CTC TTC AGA-5'DNA (nontranscribed strand) 5'-ATG GTG CAC CTG ACT CCT GTG GAG AAG TCT-3'

RNA coding region 5'-AUG GUG CAC CUG ACU CCU GUG GAG AAG UCU-3'Polypeptide chain Met Val His Leu Thr Pro Val Glu Lys Ser

Problem 4 Answer—Nucleotide sequences

Analysis and Applications1.1 Classify each of the following statements as true or false.

(a) Each gene is responsible for only one visible trait.(b) Every trait is potentially affected by many genes.(c) The sequence of nucleotides in a gene specifies the

sequence of amino acids in a protein encoded by thegene.

(d) There is one-to-one correspondence between the setof codons in the genetic code and the set of aminoacids encoded.

1.2 From their examination of the structure of DNA,what were Watson and Crick able to infer about the prob-able mechanisms of DNA replication, coding capability,and mutation?

Page 34: Introduction to Molecular Genetics and Genomics

Analysis and Applications 33

1.3 What does it mean to say that each strand of a duplexDNA molecule has a polarity? What does it mean to saythat the paired strands in a duplex molecule have oppo-site polarity?

1.4 What is the end result of replication of a duplex DNAmolecule?

1.5 What is the role of the messenger RNA in transla-tion? What is the role of the ribosome? What is the role oftransfer RNA? Is there more than one type of ribosome?Is there more than one type of transfer RNA?

1.6 What important observation about S and R strains ofStreptococcus pneumoniae prompted Avery, MacLeod, andMcCarty to study this organism?

1.7 In the transformation experiments of Avery,MacLeod, and McCarty, what was the strongest evidencethat the substance responsible for the transformation wasDNA rather than protein?

1.8 A chemical called phenyl (carbolic acid) destroys pro-teins but not nucleic acids, and a strong alkali such assodium hydroxide destroys both proteins and nucleicacids. In the transformation experiments with Streptococcuspneumoniae, what result would be expected if the S-strainextract had been treated with phenyl? What would be ex-pected if it had been treated with a strong alkali?

1.9 What feature of the physical organization of bacterio-phage T2 made it suitable for use in the Hershey–Chaseexperiments?

1.10 Like DNA, molecules of RNA contain large amountsof phosphorus. When Hershey and Chase grew their T2phage in bacterial cells that had grown in the presence ofradioactive phosphorus, the RNA must also have incor-porated the labeled phosphorus, and yet the experimen-tal result was not compromised. Why not?

1.11 Although the Hershey–Chase experiments werewidely accepted as proof that DNA is the genetic material,the results were not completely conclusive. Why not?

1.12 The DNA extracted from a bacteriophage contains28 percent A, 28 percent T, 22 percent G, and 22 percentC. What can you conclude about the structure of thisDNA molecule?

1.13 The DNA extracted from a bacteriophage consists of24 percent A, 30 percent T, 20 percent G, and 26 percentC. What is unusual about this DNA? What can you con-clude about its structure?

1.14 A double-stranded DNA molecule is separated intoits constituent strands, and the strands are separated in anultracentrifuge. In one of the strands the base composi-tion is 24 percent A, 28 percent T, 22 percent G, and 26

When dietary control is relaxed, however, blood phenylala-nine returns to high levels. This situation is extremely dan-gerous for a developing fetus, resulting in high risk ofcongenital heart disease, small head size, mental retarda-tion, and slow growth. Affected children are said to havematernal PKU. They are affected, not because of their own in-ability to metabolize phenylalanine, but because of high lev-els of phenyalanine in their mothers’ blood. The risk can bereduced, but not entirely eliminated, if PKU mothers plantheir pregnancies and adhere to a strict dietary regimen priorto and during pregnancy. To learn more about this unantici-pated consequence of dietary treatment of PKU, log onto thephenylalanine hydroxylase knowledge database at the key-word site.

• Perhaps surprisingly, the history of the bacteriophage T2that figures so prominently in the experiments of Hershey andChase is shrouded in mystery. Indeed, the time, place, andsource material for the original isolation of phages T2, T4,and T6 (known as the “T-even phages”) may never be knownwith certainty. Use the keyword T2 to learn what is knownabout the origin of the bacteriophage and what sleuthing wasrequired to discover what is known.

• The Mutable Site changes frequently. Each new update in-cludes a different site that highlights genetics resources avail-able on the World Wide Web. Select the Mutable Site forChapter 1 and you will be linked automatically.

• The Pic Site showcases some of the most visually appeal-ing genetics sites on the World Wide Web. To visit the geneticsWeb site pictured below, select the PIC Site for Chapter 1.

Au: Phenol changed to phenyltwo times in prob 1.8. OK?

Page 35: Introduction to Molecular Genetics and Genomics

34 Chapter 1 Introduction to Molecular Genetics and Genomics

percent C. What is the base composition of the otherstrand?

1.15 While studying sewage, you discover a new type ofbacteriophage that infects E. coli. Chemical analysis re-veals protein and RNA but no DNA. Is this possible?

1.16 One strand of a DNA duplex has the base sequence 5'-ATCGTATGCACTTTACCCGG-3'. What is the base se-quence of the complementary strand?

1.17 A region along one strand of a double-stranded DNAmolecule consists of tandem repeats of the trinucleotide5'-TCG-3', so the sequence in this strand is

5'-TCGTCGTCGTCGTCG . . . -3'

What is the sequence in the other strand?

1.18 A duplex DNA molecule contains a random se-quence of the four nucleotides with equal proportions ofeach. What is the average spacing between consecutiveoccurrences of the sequence 5'-GGCC-3'? Between con-secutive occurrences of the sequence 5'-GAATTC-3'?

1.19 A region along a DNA strand that is transcribed con-tains no A. What base will be missing in the correspond-ing region of the RNA?

1.20 The duplex nucleic acid molecule shown here con-sists of a strand of DNA paired with a complementarystrand of RNA. Is the RNA the top or the bottom strand?One of the base pairs is mismatched. Which pair is it?

5'-AUCGGUUACAUUCCGACUGA-3'3'-TAGCCAATGTAAGGGTGACT-5'

1.21 The sequence of an RNA transcript that is initiallysynthesized is 5'-UAGCUAC-3', and successive nu-cleotides are added to the 3' end. This transcript is pro-duced from a DNA strand with the sequence

3'-AAGTCGCATATCGATGCTAGCGCAACCT-5'

What is the sequence of the RNA transcript when synthe-sis is complete?

1.22 An RNA molecule folds back upon itself to form a“hairpin”structure held together by a region of base pair-ing. One segment of the molecule in the paired region hasthe base sequence 5'-AUACGAUA-3'. What is the basesequence with which this segment is paired?

1.23 A synthetic mRNA molecule consists of the repeat-ing base sequence

5'-UUUUUUUUUUUU . . . -3'

When this molecule is translated in vitro using ribosomes,transfer RNAs, and other necessary constituents from E.coli, the result is a polypeptide chain consisting of the re-

peating amino acid Phe�Phe�Phe�Phe . . . . If you as-sume that the genetic code is a triplet code, what does thisresult imply about the codon for phenylalanine (Phe)?

1.24 A synthetic mRNA molecule consisting of the re-peating base sequence

5'-UUUUUUUUUUUU . . . -3'

is terminated by the addition, to the right-hand end, of asingle nucleotide bearing A. When translated in vitro, theresulting polypeptide consists of a repeating sequence ofphenylalanines terminated by a single leucine. What doesthis result imply about the codon for leucine?

1.25 With in vitro translation of an RNA into a polypep-tide chain, the translation can begin anywhere along theRNA molecule. A synthetic RNA molecule has thesequence

5'-CGCUUACCACAUGUCGCGAACUCG-3'

How many reading frames are possible if this molecule istranslated in vitro? How many reading frames are possibleif this molecule is translated in vivo, in which translationstarts with the codon AUG?

1.26 You have sequenced both strands of a double-stranded DNA molecule. To inspect the potential aminoacid coding content of this molecule, you conceptuallytranscribe it into RNA and then conceptually translate theRNA into a polypeptide chain. How many reading frameswill you have to examine?

1.27 A synthetic mRNA molecule consists of the repeat-ing base sequence 5'-UCUCUCUCUCUCUCUC . . . -3'.When this molecule is translated in vitro, the result is apolypeptide chain consisting of the alternating aminoacids Ser�Leu�Ser�Leu�Ser�Leu . . . . Why do theamino acids alternate? What does this result imply aboutthe codons for serine (Ser) and leucine (Leu)?

1.28 A synthetic mRNA molecule consists of the repeat-ing base sequence 5'-AUCAUCAUCAUCAUCAUC . . . -3'.When this molecule is translated in vitro, the result is amixture of three different polypeptide chains. One con-sists of repeating isoleucines (Ile�Ile�Ile�Ile . . .), an-other of repeating serines (Ser�Ser�Ser�Ser . . .), andthe third of repeating histidines (His�His�His�His . . .).What does this result imply about the manner in whichan mRNA is translated?

1.29 How is it possible for a gene with a mutation in thecoding region to encode a polypeptide with the sameamino acid sequence as the nonmutant gene?

1.30 A polymer is made that has a random sequence con-sisting of 75 percent G’s and 25 percent U’s. Among theamino acids in the polypeptide chains resulting from invitro translation, what is the expected frequency of Trp?of Val? of Phe?

Page 36: Introduction to Molecular Genetics and Genomics

Further Reading 35

Challenge Problems1.31 The coding sequence in the messenger RNA foramino acids 1 through 10 of human phenylalaninehydroxylase is

5'-AUGUCCACUGCGGUCCUGGAAAACCCAGGC-3'

(a) What are the first 10 amino acids?(b) What sequence would result from a mutant RNA in

which the red A was changed to G? (c) What sequence would result from a mutant RNA in

which the red C was changed to G?(d) What sequence would result from a mutant RNA in

which the red U was changed to C?(e) What sequence would result from a mutant RNA in

which the red G was changed to U?

1.32 A “frameshift” mutation is a mutation in whichsome number of base pairs, not a multiple of three, is in-serted into or deleted from a coding region of DNA. Theresult is that, at the point of the frameshift mutation, thereading frame of protein translation is shifted with re-spect to the nonmutant gene. To see the consequence,

consider that the coding sequence in the messenger RNAfor the first 10 amino acids in human beta hemoglobin(part of the oxygen carrying protein in the blood) is

5'–AUGGUGCACCUGACUCCUGAGGAGAAGUCU · · · –3'

(a) What is the amino acid sequence in this part of thepolypeptide chain?

(b) What would be the consequence of a frameshift mu-tation resulting in an RNA missing the red U?

(c) What would be the consequence of a frameshift mu-tation resulting in an RNA with a G inserted immedi-ately in front of the red U?

1.31 With regard to the wildtype and mutant RNA mole-cules described in the previous problem, deduce the basesequence in both strands of the corresponding double-stranded DNA for:(a) The wildtype sequence(b) The single-base deletion(c) The single-base insertion

Further ReadingAbedon, S. T. 2000. The murky origin of Snow White

and her T-even dwarfs. Genetics 155: 481.Bearn, A. G. 1994. Archibald Edward Garrod, the

reluctant geneticist. Genetics 137: 1.Calladine, C. R. 1997. Understanding DNA: The Molecule

and How It Works. New York: Academic Press.Ciechanover, A., A. Orian, and A. L. Schwartz. 2000.

Ubiquitin-mediated proteolysis: Biological regulationvia destruction. Bioessays 22: 442.

Doolittle, W. F. 2000. Uprooting the tree of life. ScientificAmerican, February.

Gehrig, A., S. R. Schmidt, C. R. Muller, S. Srsen, K.Srsnova, and W. Kress. 1997. Molecular defects inalkaptonuria. Cytogenetics & Cell Genetics 76: 14.

Gould, S. J. 1994. The evolution of life on the earth.Scientific American, October.

Horowitz, N. H. 1996. The sixtieth anniversary ofbiochemical genetics. Genetics 143: 1.

Judson, H. F. 1996. The Eighth Day of Creation: The Makersof the Revolution in Biology. Cold Spring Harbor, NY: ColdSpring Harbor Laboratory Press.

Mirsky, A. 1968. The discovery of DNA. ScientificAmerican, June.

Olson, G. J., and C. R. Woese. 1997. Archaeal genomics:An overview. Cell 89: 991.

Radman, M., and R. Wagner. 1988. The high fidelity ofDNA duplication. Scientific American, August.

Rennie, J. 1993. DNA’s new twists. Scientific American,March.

Scazzocchio, C. 1997. Alkaptonuria: From humans tomoulds and back. Trends in Genetics 13: 125.

Scriver, C. R., and P. J. Waters. 1999. Monogenic traitsare not simple: Lessons from phenylketonuria. Trendsin Genetics 15: 267.

Stadler, D. 1997. Ultraviolet-induced mutation and thechemical nature of the gene. Genetics 145: 863.

Susman, M. 1995. The Cold Spring Harbor phage course(1945–1970): A 50th anniversary remembrance.Genetics 139: 1101.

Watson, J. D. 1968. The Double Helix. New York:Atheneum.

Zhang, J. Z. 2000. Protein-length distributions for thethree domains of life. Trends in Genetics 16: 107.

Page 37: Introduction to Molecular Genetics and Genomics

36

C H A P T E R

DNA Structureand DNAManipulation

2

Page 38: Introduction to Molecular Genetics and Genomics

37

• A DNA strand is a polymer of A, T, G and Cdeoxyribonucleotides joined 3'-to-5' by phosphodiester bonds.

• The two DNA strands in a duplex are held together byhydrogen bonding between the A�T and G�C base pairs andby base stacking of the paired bases.

• Each type of restriction endonuclease enzyme cleaves double-stranded DNA at a particular sequence of bases usually four orsix nucleotides in length.

• The DNA fragments produced by a restriction enzyme can beseparated by electrophoresis, isolated, sequenced, andmanipulated in other ways.

• Separated strands of DNA or RNA that are complementary innucleotide sequence can come together (hybridize)spontaneously to form duplexes.

• DNA replication takes place only by elongation of the growingstrand in the 5'-to-3' direction through the addition ofsuccessive nucleotides to the 3' end.

• In the polymerase chain reaction, short oligonucleotideprimers are used in successive cycles of DNA replication toamplify selectively a particular region of a DNA duplex.

• Genetic markers in DNA provide a large number of easilyaccessed sites in the genome that can be used to identify thechromosomal locations of disease genes, for DNA typing inindividual identification, for the genetic improvement ofcultivated plants and domesticated animals, and for manyother applications.

The Double HelixJames D. Watson and Francis H. C. Crick 1953A Structure for Deoxyribose Nucleic Acid

Origin of the Human Genetic Linkage MapDavid Botstein, Raymond L. White, Mark Skolnick, and Ronald W.Davis 1980Construction of a Genetic Linkage Map in Man Using RestrictionFragment Length Polymorphisms

P R I N C I P L E S

C O N N E C T I O N S

C H A P T E R O U T L I N E2.1 Genomes and Genetic Differences Among

IndividualsDNA Markers as Landmarks in

Chromosomes

2.2 The Molecular Structure of DNAPolynucleotide ChainsBase Pairing and Base StackingAntiparallel StrandsDNA Structure as Related to

Function

2.3 The Separation and Identificatin ofGenomic DNA FragmentsRestriction Enzymes and Site-Specific

DNA CleavageGel ElectrophoresisNucleic Acid HybridizationThe Southern Blot

2.4 Selective Replication of Genomic DNAFragmentsConstraints on DNA Replication:

Primers and 5'-to-3' Strand Elongation

The Polymerase Chain Reaction

2.5 The Terminology of Genetic Analysis

2.6 Types of DNA Markers Present inGenomic DNASingle Nucleotide Polymorphisms

(SNPs)Restriction Fragment Length

Polymorphisms (RFLPs)Random Amplified Polymorphic

DNA (RAPD)Amplified Fragment Length

Polymorphisms (AFLPs)Simple Tandem Repeat

Polymorphisms (STRPs)

2.7 Applications of DNA MarkersGenetic Markers, Genetic Mapping,

and “Disease Genes”Other Uses for DNA Markers

Page 39: Introduction to Molecular Genetics and Genomics

38 Chapter 2 DNA Structure and DNA Manipulation

In Chapter 1, we reviewed the experimen-tal evidence demonstrating that thegenetic material is DNA. We saw how,

through the unique structure of the DNAmolecule, genetic information can be tran-scribed and translated into proteins thataffect the inherited characteristics of organ-isms. When a mutant gene encodes a non-functional protein that results in somephysical or physiological abnormality—forexample, an “inborn error of metabo-lism”—the expression of that abnormalitycan be used to trace the transmission of themutant gene from one generation to thenext in a pedigree (family history). As aconsequence, until recently, the first step ingenetic analysis was the identification of or-ganisms with such abnormal traits, such aspeas with wrinkled seeds instead of roundseeds and fruit flies with white eyes insteadof red. These traits were studied by meansof controlled crosses so that the parentageof each individual could be traced. Large-scale genetic studies were typically limitedto one of a small number of model organ-isms especially favorable for isolating andidentifying mutant genes, such as the bud-ding yeast Saccharomyces cerevisiae, the nem-atode worm Caenorhabditis elegans, or thefruit fly Drosophila melanogaster.

Since the mid-1970s, studies in geneticshave undergone a revolution based on theuse of increasingly sophisticated ways toisolate and identify specific fragments ofDNA. The culmination of these techniqueswas large-scale genomic sequencing—theability to determine the correct sequence ofthe base pairs that make up the DNA in anentire genome and to identify the se-quences associated with genes. Becausemany of the model organisms used in ge-netics have relatively small genomes, thesesequences were completed first, in the late1990s (Figure 2.1) The techniques used to se-quence these simpler genomes were thenscaled up to sequence the human genome.The initial “rough draft” of the humangenome was announced in June 2000; thisrepresents an important milestone in theHuman Genome Project, whose goals in-

Figure 2.1 Timeline of large-scale genomic DNAsequencing.

09131_01_1651A

Page 40: Introduction to Molecular Genetics and Genomics

tions that are responsible for genetic dis-eases such as phenylketonuria and otherinborn errors of metabolism, as well as themutations that increase the risk of morecomplex diseases such as heart disease,breast cancer, and diabetes.

Fortunately, only a small proportion ofall differences in DNA sequence are asso-ciated with disease. Some of the others areassociated with inherited differences inheight, weight, hair color, eye color, facialfeatures, and other traits. Most of thegenetic differences between people arecompletely harmless. Many have no de-tectable effects on appearance or health.Such differences can be studied onlythrough direct examination of the DNA it-self. These harmless differences are never-theless important, because they serve asgenetic markers.

DNA Markers as Landmarks in ChromosomesIn genetics, a genetic marker is any differ-ence in DNA, no matter how it is detected,whose pattern of transmission from genera-tion to generation can be tracked. Each in-dividual who carries the marker also carriesa length of chromosome on either side of it,so it marks a particular region of thegenome. A mutant gene, or some portion ofa mutant gene, can serve as a geneticmarker. In the “classical” approach to ge-netics, it is the outward expression of agene (or lack of expression) that forms thebasis of genetic analysis. For example, amutation causing wrinkled peas is a geneticmarker, which can be identified through itseffects on pea shape. In modern geneticanalysis, any difference in DNA sequencebetween two individuals can serve as a ge-netic marker. And although these geneticmarkers are often harmless in themselves,they allow the positions of disease genes tobe located and their DNA isolated, identi-fied, and studied.

Genetic markers that are detected by di-rect analysis of the DNA are often calledDNA markers. DNA markers are impor-tant in genetics because they serve as land-marks in long DNA molecules, such as thosefound in chromosomes, which allow ge-netic differences among individuals to be

2.1 Genomes and Genetic Differences Among Individuals 39

clude determining the sequence, and iden-tifying the function, of all human genes.

2.1 Genomes and GeneticDifferences Among Individuals

The numbers associated with the genome ofeven a simple organism can be intimidating.The sequenced genomes of D. melanogasterand C. elegans, both approximately 100 mil-lion base pairs in length, encode 13,601proteins and 18,424 proteins, respectively.The human genome is considerably larger.As found in a human reproductive cell, thehuman genome consists of 3 billion basepairs organized into 23 distinct chromo-somes (each chromosome contains a singlemolecule of duplex DNA). A typical chro-mosome can contain several hundred toseveral thousand genes, arranged in linearorder along the DNA molecule present inthe chromosome. The sequences that makeup the protein-coding part of these genesactually account for only about 4 percent ofthe entire genome. The other 96 percent ofthe sequences do not code for proteins.Some noncoding sequences are genetic“chaff” that gets separated from the protein-coding “wheat” when genes are transcribedand the RNA transcript is processed intomessenger RNA. Other noncoding se-quences are relatively short sequences thatare found in hundreds or thousands ofcopies scattered throughout the genome.Still other noncoding sequences are rem-nants of genes called pseudogenes. As mightbe expected, identifying the protein-codinggenes from among the large background ofnoncoding DNA in the human genome is achallenge in itself.

Geneticists often speak of the nucleotidesequence of “the” human genome because99.9 percent of the DNA sequences in anytwo individuals are the same. This is ourevolutionary legacy; it contains the geneticinformation that makes us human beings.In reality, however, there are many differ-ent human genomes. Geneticists have greatinterest in the 0.1 percent of the humanDNA sequence—3 million base pairs—thatdiffers from one genome to the next, be-cause these differences include the muta-

Au: In 1st para in1st column, proof-reader asks if“hundreds orthousands ofcopies” is correct?Or should phrasebe “hundreds ofthousands ofcopies”?

Page 41: Introduction to Molecular Genetics and Genomics

tracked. They are like signposts along ahighway. Using DNA markers as landmarks,the geneticist can identify the positions ofnormal genes, mutant genes, breaks inchromosomes, and other features impor-tant in genetic analysis (Figure 2.2).

The detection of DNA markers usuallyrequires that the genomic DNA (the totalDNA extracted from cells of an organism)be fragmented into pieces of manageablesize (usually a few thousand nucleotidepairs) that can be manipulated in labora-tory experiments. In the following sec-tions, we examine some of the principalways in which DNA is manipulated to re-veal genetic differences among individuals,whether or not these differences find out-

ward expression. Use of these methodsbroadens the scope of genetics, making itpossible to carry out genetic analysis in anyorganism. This means that detailed geneticanalysis is no longer restricted to humanbeings, domesticated animals, cultivatedplants, and the relatively small number ofmodel organisms favorable for geneticstudies. Direct study of DNA eliminates theneed for prior identification of genetic dif-ferences between individuals; it even elim-inates the need for controlled crosses. Themethods of molecular analysis discussed inthis chapter have transformed genetics:

The manipulation of DNA is the basic experi-mental operation in modern genetics.

40 Chapter 2 DNA Structure and DNA Manipulation

A B

C

DNA fragment

DNA marker

Double-strandedDNA molecule

Replication inbacterial cells

Cleavage

D

D

D

Nucleus

Chromosomes

HUMAN CELL

BACTERIALCELL

CLONED DNA

Chromosomes are located in the cell nucleus.

Each chromosome contains one long molecule of duplex (double-stranded) DNA.

DNA markers are used to identify particular regions along the DNA in chromosomes.

The fragments can be transferred into bacterial cells, where they can replicate. (This is the procedure of cloning.)

Large quantities of the cloned human DNA fragment can be isolated from the bacterial cells.

Single DNA fragments can be cleaved from the molecule.

A DNA marker, in this case D, serves to identify bacterial cells containing a particular DNA fragment of interest.

Figure 2.2 DNA markers serve as landmarks that identify physical positions along a DNA molecule,such as DNA from a chromosome. As shown at the right, a DNA marker can also be used to identifybacterial cells into which a particular fragment of DNA has been introduced. The procedure of DNAcloning is not quite as simple as indicated here; it is discussed further in Chapter 13.

Au: Proofreaderasks that you lcon-firm Chapter 13reference in Fig 2.2caption is correct.

Page 42: Introduction to Molecular Genetics and Genomics

These methods are the principal techniquesused in virtually every modern geneticslaboratory.

2.2 The Molecular Structure of DNA

Modern experimental methods for the ma-nipulation and analysis of DNA grew out ofa detailed understanding of its molecularstructure and replication. Therefore, to un-derstand these methods, one needs toknow something about the molecularstructure of DNA. We saw in Chapter 1 thatDNA is a helix of two paired, complemen-tary strands, each composed of an orderedstring of nucleotides, each bearing one of thebases A (adenine), T (thymine), G (gua-nine), or cytosine (C). Watson–Crick basepairing between A and T and between Gand C in the complementary strands holdsthe strands together. The complementarystrands also hold the key to replication, be-cause each strand can serve as a templatefor the synthesis of a new complementarystrand. We will now take a closer look atDNA structure and at the key features of itsreplication.

Polynucleotide ChainsIn terms of biochemistry, a DNA strand is apolymer—a large molecule built fromrepeating units. The units in DNA are com-posed of 2'-deoxyribose (a five-carbon su-

gar), phosphoric acid, and the four nitro-gen-containing bases denoted A, T, G, andC. The chemical structures of the bases areshown in Figure 2.3. Note that two of thebases have a double-ring structure; theseare called purines. The other two baseshave a single-ring structure; these arecalled pyrimidines.

• The purine bases are adenine (A) andguanine (G).

• The pyrimidine bases are thymine (T) andcytosine (C).

2.2 The Molecular Structure of DNA 41

H

C

CC

H

N

C N

N

5

4

67

892

1

3

H

N

Adenine Guanine

Deoxyribose

O

CH3

H

TC

CC

NNC

16 4

3

5

2

Thymine Cytosine

N

A

N

C

CCN

C NN

N

H

H

H

G

Deoxyribose Deoxyribose

O

NH

H

H

O

Deoxyribose

H

H

CC

CC

NC

HCHC

O

N

Purines Pyrimidines

Figure 2.3 Chemical structures of the four nitrogen-containing bases in DNA: adenine, thymine,guanine, and cytosine. The nitrogen atom linked to the deoxyribose sugar is indicated. The atomsshown in red participate in hydrogen bonding between the DNA base pairs.

Photocaptionphotocaptionphotocaptionphotocaptionphotocaptionphotocaptionphotocaption-photocaptionphotocaptionphotocaptionphoto-caption

09131_01_1745P

Page 43: Introduction to Molecular Genetics and Genomics

In DNA, each base is chemically linkedto one molecule of the sugar deoxyribose,forming a compound called a nucleoside.When a phosphate group is also attachedto the sugar, the nucleoside becomes anucleotide (Figure 2.4). Thus a nucleotideis a nucleoside plus a phosphate. In theconventional numbering of the carbonatoms in the sugar in Figure 2.4, the car-bon atom to which the base is attached isthe 1' carbon. (The atoms in the sugar aregiven primed numbers to distinguish themfrom atoms in the bases.) The nomencla-ture of the nucleoside and nucleotide de-rivatives of the DNA bases is summarized

in Table 2.1. Most of these terms are notneeded in this book; they are included be-cause they are likely to be encountered infurther reading.

In nucleic acids, such as DNA and RNA,the nucleotides are joined to form apolynucleotide chain, in which the phos-phate attached to the 5' carbon of one sugaris linked to the hydroxyl group attached tothe 3' carbon of the next sugar in line(Figure 2.5). The chemical bonds by whichthe sugar components of adjacent nu-cleotides are linked through the phosphategroups are called phosphodiester bonds.The 5'–3'–5'–3' orientation of these linkages

42 Chapter 2 DNA Structure and DNA Manipulation

3 2

14

This group is OHin RNA.

5

OH

P

Phosphate

O

OH

OHOHOCH2 O

H HH H

H3 2

14

SugarSugar

Base

A, G, T, or C

Base

A, G, T, or C 5

OH

CH2 O

H HH H

H

This group is OHin RNA.

Nucleoside Nucleotide

Figure 2.4 A typical nucleotide, showing the three major components (phosphate, sugar, and base),the difference between DNA and RNA, and the distinction between a nucleoside (no phosphategroup) and a nucleotide (with phosphate). Nucleotides are monophosphates (with one phosphategroup). Nucleoside diphosphates contain two phosphate groups, and nucleoside triphosphatescontain three.

Adenine (A) Deoxyadenosine Deoxyadenosine-5'monophosphate (dAMP)diphosphate (dADP)triphosphate (dATP)

Guanine (G) Deoxyguanosine Deoxyguanosine-5'monophosphate (dGMP)diphosphate (dGDP)triphosphate (dGTP)

Thymine (T) Deoxythymidine Deoxythymidine-5'monophosphate (dTMP)diphosphate (dTDP)triphosphate (dTTP)

Cytosine (C) Deoxycytidine Deoxycytidine-5'monophosphate (dCMP)diphosphate (dCDP)triphosphate (dCTP)

Table 2.1 DNA nomenclature

Base Nucleoside Nucleotide

Au: Query fromMH—Why noprimes shownhere (shown inFig 2.5)?

Page 44: Introduction to Molecular Genetics and Genomics

continues throughout the chain, whichtypically consists of millions of nucleotides.Note that the terminal groups of eachpolynucleotide chain are a 5'-phosphate(5'-P) group at one end and a 3'-hydroxyl(3'-OH) group at the other. The asymme-try of the ends of a DNA strand implies thateach strand has a polarity determined bywhich end bears the 5' phosphate andwhich end bears the 3' hydroxyl.

A few years before Watson and Crickproposed their essentially correct three-dimensional structure of DNA as a doublehelix, Erwin Chargaff developed a chemical

technique to measure the amount of eachbase present in DNA. As we describe thistechnique, we will let the molar concentra-tion of any base be represented by the sym-bol for the base in square brackets; forexample, [A] denotes the molar concentra-tion of adenine. Chargaff used his tech-nique to measure the [A], [T], [G], and [C]content of the DNA from a variety ofsources. He found that the base composi-tion of the DNA, defined as the percentG � C, differs among species but is con-stant in all cells of an organism and within aspecies. Data on the base composition of

2.2 The Molecular Structure of DNA 43

5’ end terminateswith phosphate group

3’ end terminateswith hydroxyl (–OH)

Phosphate linkedto 5’ carbon andto 3’ carbon

(A) (B)

5’ end

5’ end

3’ end

3’ end

Phosphodiesterbonds

3’

3’

3’

5’

5’

5’ O

O

CH2

H H

H

H

H H

H

P O–O

–O

O

O

O

CH2

H HH H

H

P O–O

–O

O

O

CH2

H HH H

OH H

P O

O

NH2

O

CN

N

NH2

NH2

H

H

H

H

O

A

GN

NN

N

N

N

N

N

AP

GP

CP

HO

Figure 2.5 Three nucleotides at the 5' end of a single polynucleotide strand. (A) The chemicalstructure of the sugar–phosphate linkages, showing the 5'-to-3' orientation of the strand (the rednumbers are those assigned to the carbon atoms). (B) A common schematic way to depict a polynu-cleotide strand.

Page 45: Introduction to Molecular Genetics and Genomics

DNA from a variety of organisms are givenin Table 2.2.

Chargaff also observed certain regularrelationships among the molar concentra-tions of the different bases. These relation-ships are now called Chargaff’s rules:

• The amount of adenine equals that ofthymine: [A] � [T].

• The amount of guanine equals that ofcytosine: [G] � [C].

• The amount of the purine bases equals that ofthe pyrimidine bases:

[A] � [G] � [T] � [C].

Although the chemical basis of these ob-servations was not known at the time, one ofthe appealing features of the Watson–Crickstructure of paired complementary strandswas that it explained Chargaff’s rules. Be-cause A is always paired with T in double-stranded DNA, it must follow that [A] �[T]. Similarly, because G is paired with C, weknow that [G] � [C]. The third rule follows

by addition of the other two: [A] � [G] �[T] � [C]. In the next section, we examinethe molecular basis of base pairing in moredetail.

Base Pairing and Base StackingIn the three-dimensional structure of theDNA molecule proposed in 1953 by Watsonand Crick, the molecule consists of twopolynucleotide chains twisted around oneanother to form a double-stranded helix inwhich adenine and thymine, and guanineand cytosine, are paired in opposite strands(Figure 2.6). In the standard structure, whichis called the B form of DNA, each chainmakes one complete turn every 34 Å. Thehelix is right-handed, which means that asone looks down the barrel, each chain fol-lows a clockwise path as it progresses. Thebases are spaced at 3.4 Å, so there are tenbases per helical turn in each strand and tenbase pairs per turn of the double helix.

44 Chapter 2 DNA Structure and DNA Manipulation

Bacteriophage T7 26.0 26.0 24.0 24.0 48.0

Bacteria

Clostridium perfringens 36.9 36.3 14.0 12.8 26.8Streptococcus pneumoniae 30.2 29.5 21.6 18.7 40.3Escherichia coli 24.7 23.6 26.0 25.7 51.7Sarcina lutea 13.4 12.4 37.1 37.1 74.2

Fungi

Saccharomyces cerevisiae 31.7 32.6 18.3 17.4 35.7

Neurospora crassa 23.0 22.3 27.1 27.6 54.7

Higher plants

Wheat 27.3 27.2 22.7 22.8* 45.5Maize 26.8 27.2 22.8 23.2* 46.0

Animals

Drosophila melanagaster 30.8 29.4 19.6 20.2 39.8Pig 29.4 29.6 20.5 20.5 41.0Salmon 29.7 29.1 20.8 20.4 41.2Human being 29.8 31.8 20.2 18.2 38.4

*Includes one-fourth 5-methylcytosine, a modified form of cytosine found in most plants more complex than algae and inmany animals

Table 2.2 Base composition of DNA from different organisms

Base (and percentage of total bases) Base composition

Organism Adenine Thymine Guanine Cytosine (percent G � C)

Page 46: Introduction to Molecular Genetics and Genomics

P

(A)

34 Å per complete turn (10 base pairs per turn)

Phosphate

Deoxyribosesugar

Adenine

Thymine

Guanine

Cytosine

Guanine

Oxygen

Hydrogen

Phosphorus

C in sugar–phosphate chain

C and N in bases

Cytosine

Adenine

Thymine

(B)(A)

HN

N

NN N

N

N

O

CH3

CC

C

CC

C CC

C

O

H

H

H

H H

H

N NN N

N

CC

C

CC

C CC

CO

H

H

H

H

H

NH

O

N

N

H

PPP

P P

TAGC

CGCG

GC

TA

AT

CG

Diameter20 Å

Base

P

Minor groove

Major groove

Figure 2.6 Two representations of DNA, illustrating the three-dimensional structure of the doublehelix. (A) In a ribbon diagram, the sugar–phosphate backbones are depicted as bands, with horizon-tal lines used to represent the base pairs. (B) A computer model of the B form of a DNA molecule.The stick figures are the sugar–phosphate chains winding around outside the stacked base pairs,forming a major groove and a minor groove. The color coding for the base pairs is as follows: A, redor pink; T, dark green or light green; G, dark brown or beige; C, dark blue or light blue. The basesdepicted in dark colors are those attached to the blue sugar–phosphate backbone; the bases depictedin light colors are attached to the beige backbone. [B, courtesy of Antony M. Dean.]

2.2 The Molecular Structure of DNA 45

Page 47: Introduction to Molecular Genetics and Genomics

The strands feature base pairing, inwhich each base is paired to a complemen-tary base in the other strand by hydrogenbonds. (A hydrogen bond is a weak bondin which two participating atoms share ahydrogen atom between them.) The hydro-gen bonds provide one type of force holdingthe strands together. In Watson–Crick basepairing, adenine (A) pairs with thymine(T), and guanine (G) pairs with cytosine(C). The hydrogen bonds that form in theadenine–thymine base pair and in theguanine–cytosine pair are illustrated inFigure 2.7. Note that an A�T pair (Figure2.7A and B) has two hydrogen bonds andthat a G�C pair (Figure 2.7C and D) hasthree hydrogen bonds. This means that thehydrogen bonding between G and C isstronger in the sense that it requires moreenergy to break; for example, the amountof heat required to separate the pairedstrands in a DNA duplex increases with the

percent of G � C. Because nothing re-stricts the sequence of bases in a singlestrand, any sequence could be presentalong one strand. This explains Chargaff’sobservation that DNA from different or-ganisms may differ in base composition.However, because the strands in duplexDNA are complementary, Chargaff’s rulesof [A] � [T] and [G] � [C] are true what-ever the base composition.

In the B form of DNA, the paired basesare planar, parallel to one another, and per-pendicular to the long axis of the doublehelix. This feature of double-stranded DNAis known as base stacking. The upper andlower faces of each nitrogenous base arerelatively flat and nonpolar (uncharged).These surfaces are said to be hydrophobicbecause they bind poorly to water mole-cules, which are very polar. (The polarityrefers to the asymmetrical distribution ofcharge across the V-shaped water molecule;

46 Chapter 2 DNA Structure and DNA Manipulation

Adenine

Guanine

Thymine

Cytosine

(C)

(A)

(D)

(B)

Three hydrogen bonds attract G and C.

Two hydrogen bonds attract A and T.

H

Deoxyribose

Deoxyribose

N

N

N

N

N

N

N

O

CH3

T

C

C

CCCC

C

C C

A

O

HH

H

H

H

Deoxyribose

Deoxyribose

N

N

N

N

N

C

C

C

CCCC

C

C C

G

O

HH

HH

H

H

N H

O

NN

H

Figure 2.7 Normal base pairs in DNA. On the left, the hydrogen bonds (dotted lines) with the joinedatoms are shown in red. (A and B) A�T base pairing. (C and D) G�C base pairing. In the space-filling models (B and D), the colors are as follows: C, gray; N, blue; O, red; and H (shown in the basesonly), white. Each hydrogen bond is depicted as a white disk squeezed between the atoms sharingthe hydrogen. The stick figures on the outside represent the backbones winding around the stackedbase pairs. [Space-filling models courtesy of Antony M. Dean.]

Au: Term in KeyTerms list is“hydrophobicinteraction”rather than“hydrophic.”Would you like tochange term inlist? Or changetext here?

Page 48: Introduction to Molecular Genetics and Genomics

the oxygen at the base of the V tends to bequite negative, whereas the hydrogens atthe tips are quite positive). Owing to theirrepulsion of water molecules, the pairednitrogenous bases tend to stack on top ofone another in such a way as to exclude themaximum amount of water from the in-terior of the double helix. Hence a double-stranded DNA molecule has a hydrophobiccore composed of stacked bases, and it is theenergy of base stacking that providesdouble-stranded DNA with much of itschemical stability.

When discussing a DNA molecule, mol-ecular biologists frequently refer to the in-dividual strands as single strands or assingle-stranded DNA; they refer to the dou-ble helix as double-stranded DNA or as aduplex molecule. The two grooves spiralingalong outside of the double helix are notsymmetrical; one groove, called the majorgroove, is larger than the other, which iscalled the minor groove. Proteins that in-teract with double-stranded DNA oftenhave regions that make contact with thebase pairs by fitting into the major groove,into the minor groove, or into both grooves(Figure 2.6B).

Antiparallel StrandsEach backbone in a double helix consists ofdeoxyribose sugars alternating with phos-phate groups that link the 3' carbon atom ofone sugar to the 5' carbon of the next inline (Figure 2.5). The two polynucleotidestrands of the double helix have oppositepolarity in the sense that the 5' end of onestrand is paired with the 3' end of the otherstrand. Strands with such an arrangementare said to be antiparallel. One implica-tion of antiparallel strands in duplex DNA isthat in each pair of bases, one base is at-tached to a sugar that lies above the planeof pairing, and the other base is attached toa sugar that lies below the plane of pairing.Another implication is that each terminusof the double helix possesses one 5'-P group(on one strand) and one 3'-OH group (onthe other strand), as shown in Figure 2.8.

The diagram of the DNA duplex in Fig-ure 2.6 is static and so somewhat mislead-ing. DNA is a dynamic molecule, constantlyin motion. In some regions, the strands canseparate briefly and then come togetheragain in the same conformation or in a dif-

ferent one. The right-handed double helixin Figure 2.6 is the standard B form, but de-pending on conditions, DNA can actuallyform more than 20 slightly different vari-ants of a right-handed helix, and some re-gions can even form helices in which thestrands twist to the left (called the Z formof DNA). If there are complementarystretches of nucleotides in the same strand,then a single strand, separated from itspartner, can fold back upon itself like ahairpin. Even triple helices consisting ofthree strands can form in regions of DNAthat contain suitable base sequences.

2.2 The Molecular Structure of DNA 47

TP

GP

CP

AP

TP

GP

3’ end

(terminates in3’ hydroxyl)

5’ end

(terminates in5’ phosphate)

5’ end

(terminates in5’ phosphate)

3’ end

(terminates in3’ hydroxyl)

HO

C

A

T

G

OH

C

A

P

P

P

P

P

P

Figure 2.8 A segment of a DNA molecule,showing the antiparallel orientation of thecomplementary strands. The overlying bluearrows indicate the 5'-to-3' direction of eachstrand. The phosphates (P) join the 3' carbonatom of one deoxyribose (horizontal line) to the5' carbon atom of the adjacent deoxyribose.

Page 49: Introduction to Molecular Genetics and Genomics

DNA Structure as Related to FunctionIn the structure of the DNA molecule, wecan see how three essential requirements ofa genetic material are met.

1. Any genetic material must be able to bereplicated accurately, so that the informationit contains will be precisely replicated andinherited by daughter cells. The basis forexact duplication of a DNA molecule is the

pairing of A with T and of G with C in thetwo polynucleotide chains. Unwinding andseparation of the chains, with each free chainbeing copied, results in the formation of twoidentical double helices (see Figure 1.6).

2. A genetic material must also have thecapacity to carry all of the informationneeded to direct the organization andmetabolic activities of the cell. As we saw inChapter 1, the product of most genes is aprotein molecule—a polymer composed ofrepeating units of amino acids. The sequence

48 Chapter 2 DNA Structure and DNA Manipulation

The Double Helix

James D. Watson and Francis H. C. Crick 1953Cavendish Laboratory, Cambridge, EnglandA Structure for Deoxyribose Nucleic Acid

This is one of the watershed papers of twen-tieth-century biology. After its publication,nothing in genetics was the same. Every-thing that was known, and everything stillto be discovered, would now need to be in-terpreted in terms of the structure andfunction of DNA. The importance of thepaper was recognized immediately, in nosmall part because of its lucid and concisedescription of the structure. Watson andCrick benefited tremendously in knowingthat their structure was consistent with theunpublished structural studies of MauriceWilkins and Rosalind Franklin. The sameissue of Nature that included the Watsonand Crick paper also included, back toback, a paper from the Wilkins group andone from the Franklin group detailing theirdata and the consistency of their data withthe proposed structure. It has been saidthat Franklin was poised a mere two half-steps from making the discovery herself,alone. In any event, Watson and Crick andWilkins were awarded the 1962 NobelPrize for their discovery of DNA structure.Rosalind Franklin, tragically, died of cancerin 1958 at the age of 38.

We wish to suggest a structure for the saltof deoxyribose nucleic acid (DNA). . . .The structure has two helical chainseach coiled round the same axis. . . .Both chains follow right-handed helices,but the two chains run in opposite di-rections. . . . The bases areon the inside of the helix andthe phosphates on the out-side. . . . There is a residueon each chain every 3.4 Å andthe structure repeats after 10residues. . . . The novel fea-ture of the structure is themanner in which the twochains are held together bythe purine and pyrimidinebases. The planes of thebases are perpendicular tothe fiber axis. They are joinedtogether in pairs, a single base from onechain being hydrogen-bonded to a sin-gle base from the other chain, so thatthe two lie side by side. One of the pairmust be a purine and the other a pyrimi-dine for bonding to occur. . . . Only spe-cific pairs of bases can bond together.These pairs are adenine (purine) withthymine (pyrimidine), and guanine(purine) with cytosine (pyrimidine). Inother words, if an adenine forms onemember of a pair, on either chain, thenon these assumptions the other mem-ber must be thymine; similarly for gua-

nine and cytosine. The sequence ofbases on a single chain does not appearto be restricted in any way. However, ifonly specific pairs of bases can beformed, it follows that if the sequence ofbases on one chain is given, then the se-

quence on the otherchain is automaticallydetermined. . . . It hasnot escaped our noticethat the specific pair-ing we have postulatedimmediately suggestsa plausible copyingmechanism for the ge-netic material. . . . Weare much indebted toDr. Jerry Donohue forconstant advice andcriticism, especially on

interatomic distances. We have alsobeen stimulated by a knowledge of thegeneral nature of the unpublished ex-perimental results and ideas of Dr.Maurice H. F. Wilkins, Dr. RosalindFranklin and their co-workers at King’sCollege, London.

Source: Nature 171: 737–738

If only specific pairsof bases can be

formed, it follows that if the sequence

of bases on onechain is given, thenthe sequence on the

other chain isautomaticallydetermined.

Page 50: Introduction to Molecular Genetics and Genomics

of amino acids in the protein determines itschemical and physical properties. A gene isexpressed when its protein product issynthesized, and one requirement of thegenetic material is that it direct the order inwhich amino acid units are added to the endof a growing protein molecule. In DNA, thisis done by means of a genetic code in whichgroups of three bases specify amino acids.Because the four bases in a DNA moleculecan be arranged in any sequence, andbecause the sequence can vary from one partof the molecule to another and fromorganism to organism, DNA can contain agreat many unique regions, each of whichcan be a distinct gene. A long DNA chain candirect the synthesis of a variety of differentprotein molecules.

3. A genetic material must also be capable ofundergoing occasional mutations in whichthe information it carries is altered.Furthermore, so that mutations will beheritable, the mutant molecules must becapable of being replicated as faithfully asthe parental molecule. This feature isnecessary to account for the evolution ofdiverse organisms through the slowaccumulation of favorable mutations.Watson and Crick suggested that heritablemutations might be possible in DNA by raremispairing of the bases, with the result thatan incorrect nucleotide becomes incorpo-rated into a replicating DNA strand.

2.3 The Separation andIdentification of Genomic DNA Fragments

The following sections show how an un-derstanding of DNA structure and replica-tion has been put to practical use in thedevelopment of procedures for the separa-tion and identification of particular DNAfragments. These methods are used primar-ily either to identify DNA markers or to aidin the isolation of particular DNA frag-ments that are of genetic interest. For ex-ample, consider a pedigree of familialbreast cancer in which a particular DNAfragment serves as a marker for a bit ofchromosome that also includes the mutantgene responsible for the increased risk;then the ability to identify the fragment iscritically important in assessing the relativerisk for each of the women in the pedigree

who might carry the mutant gene. To takeanother example, suppose there is reasonto believe that a mutation causing a geneticdisease is present in a particular DNA frag-ment; then it is important to be able to pin-point this fragment and isolate it fromaffected individuals to verify whether thishypothesis is true and, if so, to identify thenature of the mutation.

Most procedures for the separation andidentification of DNA fragments can begrouped into two general categories:

1. Those that identify a specific DNA fragmentpresent in genomic DNA by making use ofthe fact that complementary single-strandedDNA sequences can, under the properconditions, form a duplex molecule. Theseprocedures rely on nucleic acid hybridization.

2. Those that use prior knowledge of thesequence at the ends of a DNA fragment tospecifically and repeatedly replicate this onefragment from genomic DNA. Theseprocedures rely on selective DNA replication(amplification) by means of the polymerasechain reaction.

The major difference between these ap-proaches is that the first (relying on nucleicacid hybridization) identifies fragments thatare present in the genomic DNA itself,whereas the second (relying on DNA ampli-fication) identifies experimentally manu-factured replicas of fragments whose originaltemplates (but not the replicas) were pre-sent in the genomic DNA. This differencehas practical implications:

• Hybridization methods require a greateramount of genomic DNA for the experimen-tal procedures, but relatively large fragmentscan be identified, and no prior knowledge ofthe DNA sequence is necessary.

• Amplification methods require extremelysmall amounts of genomic DNA for theexperimental procedures, but the amplifica-tion is usually restricted to relatively smallfragments, and some prior knowledge of DNAsequence is necessary.

The following sections discuss both types ofapproaches and give examples of how theyare used. In methods that use nucleic acidhybridization to identify particular frag-ments present in genomic DNA, the firststep is usually cutting the genomic DNAinto fragments of experimentally manage-able size. This procedure is discussed next.

2.3 The Separation and Identification of Genomic DNA Fragments 49

Page 51: Introduction to Molecular Genetics and Genomics

Restriction Enzymes and Site-Specific DNA CleavageProcedures for chemical isolation of DNA,such as those developed by Avery,MacLeod, and McCarty (Chapter 1), usuallylead to random breakage of double-stranded molecules into an average lengthof about 50,000 base pairs. This length is de-noted 50 kb, where kb stands for kilobases(1 kb � 1000 base pairs). A length of 50 kbis close to the length of double-strandedDNA present in the bacteriophage � that in-fects E. coli. The 50-kb fragments can bemade shorter by vigorous shearing forces,such as occur in a kitchen blender, but oneof the problems with breaking large DNAmolecules into smaller fragments by ran-dom shearing is that the fragments con-taining a particular gene, or part of a gene,will be of different sizes. In other words,with random shearing, it is not possible toisolate and identify a particular DNA frag-ment on the basis of its size and sequencecontent, because each randomly shearedmolecule that contains the desired se-quence somewhere within it differs in sizefrom all other molecules that contain thesequence. In this section we describe an im-portant enzymatic technique that can beused for cleaving DNA molecules at specificsites. This method ensures that all DNAfragments that contain a particular se-quence have the same size; furthermore,each fragment that contains the desired se-quence has the sequence located at exactlythe same position within the fragment.

The cleavage method makes use of animportant class of DNA-cleaving enzymesisolated primarily from bacteria. The en-zymes are called restriction endo-nucleases or restriction enzymes, andthey are able to cleave DNA molecules atthe positions at which particular, short se-quences of bases are present. These natu-rally occurring enzymes serve to protect thebacterial cell by disabling the DNA of bacte-riophages that attack it. Their discoveryearned Werner Arber of Switzerland aNobel Prize in 1978. Technically, the en-zymes are known as type II restriction endonu-cleases. The restriction enzyme BamHI is oneexample; it recognizes the double-strandedsequence

5'-GGATCC-3'3'-CCTAGG-5'

and cleaves each strand between the G-bearing nucleotides shown in red. Figure 2.9shows how the regions that make up theactive site of BamHI contact the recognitionsite (blue) just prior to cleavage, and thecleavage reaction is indicated in Figure 2.10.

Table 2.3 lists nine of the several hundredrestriction enzymes that are known. Mostrestriction enzymes are named after thespecies in which they were found. BamHI,for example, was isolated from the bac-terium Bacillus amyloliquefaciens strain H,and it is the first (I) restriction enzyme iso-lated from this organism. Because the firstthree letters in the name of each restrictionenzyme stand for the bacterial species oforigin, these letters are printed in italics; therest of the symbols in the name are not ital-icized. Most restriction enzymes recognizeonly one short base sequence, usually fouror six nucleotide pairs. The enzyme bindswith the DNA at these sites and makes abreak in each strand of the DNA molecule,producing 3'-OH and 5'-P groups at eachposition. The nucleotide sequence recog-nized for cleavage by a restriction enzyme iscalled the restriction site of the enzyme.The examples in Table 2.3 show that somerestriction enzymes cleave their restrictionsite asymmetrically (at different sites in the

50 Chapter 2 DNA Structure and DNA Manipulation

Figure 2.9 Structure of the part of the restrictionenzyme BamHI that comes into contact with itsrecognition site in the DNA (blue). The pink andgreen cylinders represent regions of the enzymein which the amino acid chain is twisted in theform of a right-handed helix. [Courtesy of A. A.Aggarwal. Reprinted with permission from M.Newman, T. Strzelecka, L. F. Dorner, I.Schildkraut, and A. A. Aggarwal, 1995. Science269: 656. Copyright 2000 American Associationfor the Advancement of Science.]

Page 52: Introduction to Molecular Genetics and Genomics

2.3 The Separation and Identification of Genomic DNA Fragments 51

3’ end

BamH1 restriction site, GGATCC

Restriction fragment Restriction fragment

5’ end

5’ end3’ end

5’ 3’

3’ 3’

G G A T C CC C T A G

GC C T A G

G

3’

5’5’

5’

G A T C CG

Cleavage creates a short complementary single-stranded overhang in each cleaved end (“sticky ends”).

Cleavage occurs in each strand at the site of the arrowhead.

New endscreated

Figure 2.10 Mechanism of DNAcleavage by the restriction enzymeBamHI. Wherever the duplexcontains a BamHI restriction site, theenzyme makes a single cut in thebackbone of each DNA strand. Eachcut creates a new 3' end and a new 5'end, separating the duplex into twofragments. In the case of BamHI thecuts are staggered cuts, so the result-ing ends terminate in single-strandedregions, each four base pairs inlength.

EcoRI Hind III AruI(Escherichia coli) (Haemophilus influenzae) (Arthrobacter luteus)

BamHI Pst I RsAI(Bacillus amyloliquefaciens H) (Providencia stuartii) (Rhodopseudomonas sphaeroides)

HaeII Taq I PvuII(Haemophilus aegyptus) (Thermus aquaticus) (Proteus vulgaris)

Note: The vertical dashed line indicates the axis of symmetry in each sequence. Red arrows indicate the sites of cutting. The enzyme TaqI yields cohe-sive ends consisting of two nucleotides, whereas the cohesive ends produced by the other enzymes contain four nucleotides. Pu and Py refer to anypurine and pyrimidine, respectively.

Table 2.3 Some restriction endonucleases, their sources, and their cleavage sites

Enzyme Enzyme Enzyme(Microorganism) (Microorganism) (Microorganism)

G A A TC T T A

T CA G

G G A TC C T A

C CG G

P u G C GP y C G C

C P yG P u

A A G CT T C G

T TA A

C T G CG A C G

A GT C

T C GA G C

AT

A G CT C G

TA

G T AC A T

CG

C A G CG T C G

T GA C

Target sequenceand cleavage site;blunt ends

Target sequenceand cleavage site;sticky ends

Page 53: Introduction to Molecular Genetics and Genomics

A G CT C G

TA

two DNA strands), but other restrictionenzymes cleave symmetrically (at the samesite in both strands). The former leavesticky ends because each end of thecleaved site has a small, single-strandedoverhang that is complementary in base se-quence to the other end (Figure 2.10). Incontrast, enzymes that have symmetricalcleavage sites yield DNA fragments thathave blunt ends. In virtually all cases, therestriction site of a restriction enzyme readsthe same on both strands, provided that theopposite polarity of the strands is taken intoaccount; for example, each strand in the re-striction site of BamHI reads 5'-GGATCC-3'(Figure 2.10). A DNA sequence with thistype of symmetry is called a palindrome.(In ordinary English, a palindrome is aword or phrase that reads the same for-wards and backwards, such as “madam.”)

Restriction enzymes have the followingimportant characteristics:

• Most restriction enzymes recognize a singlerestriction site.

• The restriction site is recognized withoutregard to the source of the DNA.

• Because most restriction enzymes recognize aunique restriction site sequence, the numberof cuts in the DNA from a particular organismis determined by the number of restrictionsites present.

The DNA fragment produced by a pairof adjacent cuts in a DNA molecule is calleda restriction fragment. A large DNA mol-ecule will typically be cut into many restric-tion fragments of different sizes. Forexample, an E. coli DNA molecule, whichcontains 4.6 � 106 base pairs, is cut intoseveral hundred to several thousand frag-ments, and mammalian genomic DNA iscut into more than a million fragments. Al-though these numbers are large, they areactually quite small relative to the numberof sugar–phosphate bonds in the DNA of anorganism.

Gel ElectrophoresisThe DNA fragments produced by a restric-tion enzyme can be separated by size usingthe fact that DNA is negatively charged andmoves in response to an electric field. If theterminals of an electrical power source areconnected to the opposite ends of a hori-zontal tube containing a DNA solution,then the DNA molecules will move towardthe positive end of the tube at a rate thatdepends on the electric field strength andon the shape and size of the molecules. Themovement of charged molecules in an elec-tric field is called electrophoresis.

The type of electrophoresis mostcommonly used in genetics is gel electro-phoresis. An experimental arrangementfor gel electrophoresis of DNA is shown inFigure 2.11. A thin slab of a gel, usuallyagarose or acrylamide, is prepared contain-ing small slots (called wells) into whichsamples are placed. An electric field is ap-plied, and the negatively charged DNA mol-ecules penetrate and move through the geltoward the anode (the positively chargedelectrode). A gel is a complex molecularnetwork that contains narrow, tortuouspassages, so smaller DNA molecules passthrough more easily; hence the rate ofmovement increases as the size of the DNAfragment decreases. Figure 2.12 shows the re-sult of electrophoresis of a set of double-stranded DNA molecules in an agarose gel.Each discrete region containing DNA iscalled a band. The bands can be visualizedunder ultraviolet light after soaking the gelin the dye ethidium bromide, the moleculesof which intercalate between the stackedbases in duplex DNA and render it fluores-cent. In Figure 2.12, each band in the gel

52 Chapter 2 DNA Structure and DNA Manipulation

Electrode

Buffersolution

Gel

Slots forsamples

Bands (visible aftersuitable treatment)

Direction ofmovement

–+

Figure 2.11 Apparatus for gel electrophoresis. Liquid gelis allowed to harden with an appropriately shaped moldin place to form “wells” for the samples (purple). Afterelectrophoresis, the DNA fragments, located at variouspositions in the gel, are made visible by immersing thegel in a solution containing a reagent that binds to orreacts with DNA. The separated fragments in a sampleappear as bands, which may be either visibly colored orfluorescent, depending on the particular reagent used.The region of a gel in which the fragments in one sam-ple can move is called a lane; this gel has seven lanes.

G G A TC C T A

C CG G

AruI(Arthrobacter luteus)

BamHI (Bacillus

amyloliquefaciens H)

Page 54: Introduction to Molecular Genetics and Genomics

2.3 The Separation and Identification of Genomic DNA Fragments 53

results from the fact that all DNA fragmentsof a given size have migrated to the sameposition in the gel. To produce a visibleband, a minimum of about 5 � 10�9

grams of DNA is required, which for a frag-ment of size 3 kb works out to about 109

molecules. The point is that a very largenumber of copies of any particular DNAfragment must be present in order to yield avisible band in an electrophoresis gel.

A linear double-stranded DNA fragmenthas an electrophoretic mobility that de-creases in proportion to the logarithm of itslength in base pairs—the longer the frag-ment, the slower it moves—but the propor-tionality constant depends on the agaroseconcentration, the composition of thebuffering solution, and the electrophoreticconditions. This means that different con-centrations of agarose allow efficient sepa-ration of different size ranges of DNAfragments (see Figure 2.13). Less dense gels,such as 0.6 percent agarose, are used to sep-arate larger fragments; whereas more densegels, such as 2 percent agarose, are used toseparate smaller fragments. The inset inFigure 2.13 shows the dependence of elec-trophoretic mobility on the logarithm offragment size. It also indicates that the lin-ear relationship breaks down for the largestfragments that can be resolved under agiven set of conditions.

Figure 2.12 Gel electrophoresis of DNA.Fragments of different sizes were mixed andplaced in a well. Electrophoresis was in thedownward direction. The DNA has been madevisible by the addition of a dye (ethidiumbromide) that binds only to DNA and thatfluoresces when the gel is illuminated withshort-wavelength ultraviolet light.

Directionof movement

Band from heaviest fragment (moves least)

Band from lightest fragment (moves most)

0

2

4

6

8

12

14

16

18

20

0.6 2.01.51.20.7 0.9

10

Siz

e ra

ng

e (k

b)

for

effi

cien

t se

par

atio

n o

flin

ear

do

ub

le-s

tran

ded

DN

A f

rag

men

ts

log

10 o

f si

ze in

bp

Distance migrated

Agarose concentration (%)

For any one agarose concentration, except for the largest fragments, the distance migrated decreases as a linear function of the logarithm of fragment size.

Figure 2.13 In agarose gels, the concentration ofagarose is an important factor in determiningthe size range of DNA fragments that can beseparated.

Page 55: Introduction to Molecular Genetics and Genomics

genetic engineering, discussed further inChapter 13.

DNA fragments that have been clonedinto organisms such as E. coli are widelyused because the fragments can be isolatedin large amounts and purified relativelyeasily. Among the uses of cloned DNA are:

• DNA sequencing. All current methods of DNAsequencing require cloned DNA fragments.These methods are discussed in Chapter 6.

• Nucleic acid hybridization. As we shall seebelow, an important application of clonedDNA fragments entails incorporating aradioactive or light-emitting “label” intothem, after which the labeled material is usedto “tag” DNA fragments containing similarsequences.

• Storage and distribution. Cloned DNA can bestored for long periods without risk of changeand can easily be distributed to otherresearchers.

Nucleic Acid HybridizationMost genomes are sufficiently large andcomplex that digestion with a restrictionenzyme produces many bands that are thesame or similar in size. Identifying a partic-ular DNA fragment in a background ofmany other fragments of similar size pre-sents a needle-in-a-haystack problem. Sup-pose, for example, that we are interested ina particular 3.0 BamHI fragment from thehuman genome that serves as a marker in-dicating the presence of a genetic risk factortoward breast cancer among the individualsin a particular pedigree. This fragment of3.0 kb is indistinguishable, on the basis ofsize alone, from fragments ranging fromabout 2.9 to 3.1 kb. How many fragmentsin this size range are expected? When hu-man genomic DNA is cleaved with BamHI,the average length of a restriction fragmentis 46 � 4096 base pairs, and the expectedtotal number of BamHI fragments is about730,000; in the size range 2.9–3.1 kb, theexpected number of fragments is about17,000. What this means is that eventhough we know that the fragment we areinterested in is 3 kb in length, it is only oneof 17,000 fragments that are so similar insize that ours cannot be distinguished fromthe others by length alone.

This identification task is actually harderthan finding a needle in a haystack because

54 Chapter 2 DNA Structure and DNA Manipulation

λ DNA

EcoRI

BamHI

(A) EcoRI enzyme

(C)

(B)

BamHI enzyme

1

1 2 3

6

4 5/6

2 53 4

1

22 5 5.5 7.5 6

6

4

5 4 2 3

6 1 5 4

5.5 17.5 5 6.5

3

7.5

2

8λ DNA

Figure 2.14 Restriction maps of � DNA for the restriction enzymes (A) EcoRI and (C) BamHI. The vertical bars indicate the sites of cutting. The numbers within the arrows are the approximate lengths of thefragments in kilobase pairs (kb). (B) An electrophoresis gel of BamHI and EcoRI enzyme digests of � DNA. Numbers indicate fragments in orderfrom largest (1) to smallest (6); the circled numbers on the mapscorrespond to the numbers beside the gel. The DNA has not undergoneelectrophoresis long enough to separate bands 5 and 6 of the BamHI digest.Note: In Problem 2 at the end of this chapter (Guide to Problem Solving),we show how to use the results of a double digest to determine the partic-ular order of fragments for a pair of restriction enzymes.

Because of the sequence specificity ofcleavage, a particular restriction enzyme pro-duces a unique set of fragments for a particularDNA molecule. Another enzyme will pro-duce a different set of fragments from thesame DNA molecule. In Figure 2.14, this prin-ciple is illustrated for the digestion of E. coliphage � DNA by either EcoRI or BamHI (seepart B). The locations of the cleavage sitesfor these enzymes in � DNA are shown inFigures 2.14A and C. A diagram showingsites of cleavage along a DNA molecule iscalled a restriction map. Particular DNAfragments can be isolated by cutting out thesmall region of the gel that contains thefragment and removing the DNA from thegel. One important use of isolated restric-tion fragments employs the enzyme DNAligase to insert them into self-replicatingmolecules such as bacteriophage, plasmids,or even small artificial chromosomes (Fig-ure 2.2). These procedures constitute DNAcloning and are the basis of one form of

Page 56: Introduction to Molecular Genetics and Genomics

haystacks are usually dry. A more accurateanalogy would be looking for a needle in ahaystack that had been pitched into aswimming pool full of water. This analogy ismore relevant because gels, even thoughthey contain a supporting matrix to makethem semisolid, are primarily composed ofwater, and each DNA molecule within a gelis surrounded entirely by water. Clearly, weneed some method by which the moleculesin a gel can be immobilized and our specificfragment identified.

The DNA fragments in a gel are usuallyimmobilized by transferring them onto asheet of special filter paper consisting ofnitrocellulose, to which DNA can be perma-nently (covalently) bound. How this isdone is described in the next section. In thissection we examine how the two strands ina double helix can be “unzipped” to formsingle strands and how, under the properconditions, two single strands that arecomplementary or nearly complementaryin sequence can be “zipped” together toform a different double helix. The “unzip-ping” is called denaturation, the “zipping”renaturation. The practical applications ofdenaturation and renaturation are many:

• A small part of a DNA fragment can be“zipped” with a much larger DNA fragment.This principle is used in identifying specificDNA fragments in a complex mixture, such asthe 3-kb BamHI marker for breast cancer thatwe have been considering. Applications of thistype include the tracking of genetic markersin pedigrees and the isolation of fragmentscontaining a particular mutant gene.

• A DNA fragment from one gene can be“zipped” with similar fragments from othergenes in the same genome; this principle isused to identify different members of familiesof genes that are similar, but not identical, insequence and that have related functions.

• A DNA fragment from one species can be“zipped” with similar sequences from otherspecies. This allows the isolation of genes that have the same or related functions inmultiple species. It is used to study aspects ofmolecular evolution, such as how differencesin sequence are correlated with differences in function, and the patterns and rates ofchange in gene sequences as they evolve.

As we saw in Section 2.2, the double-stranded helical structure of DNA is main-tained by base stacking and by hydrogenbonding between the complementary base

pairs. When solutions containing DNA frag-ments are raised to temperatures in therange 85–100°C, or to the high pH of strongalkaline solutions, the paired strands beginto separate, or “unzip.” Unwinding of thehelix happens in less than a few minutes(the time depends on the length of the mol-ecule). When the helical structure of DNAis disrupted, and the strands have becomecompletely unzipped, the molecule is saidto be denatured. A common way to detectdenaturation is by measuring the capacityof DNA in solution to absorb ultravioletlight of wavelength 260 nm, because theabsorption at 260 nm (A260) of a solution ofsingle-stranded molecules is 37 percenthigher than the absorption of the double-stranded molecules at the same concentra-tion. As shown in Figure 2.15, the progress ofdenaturation can be followed by slowlyheating a solution of double-stranded DNAand recording the value of A260 at varioustemperatures. The temperature requiredfor denaturation increases with G � Ccontent, not only because G�C base pairshave three hydrogen bonds and A�T basepairs two, but because consecutive G�Cbase pairs have stronger base stacking.

2.3 The Separation and Identification of Genomic DNA Fragments 55

Rel

ativ

e lig

ht

abso

rban

ce a

t 26

0 n

mTemperature, °C

30 50

1.40

1.30

1.20

1.10

1.00

70 90Tm 100 110

Double-stranded DNA

Denaturedsingle strands

Furtherdenaturation

Denaturationbegins

Strand separation

Temperature at which half the base pairs are denatured and half remain intact

Figure 2.15 Mechanism of denaturation of DNA by heat. The temperatureat which 50 percent of the base pairs are denatured is the melting tempera-ture, symbolized Tm.

Page 57: Introduction to Molecular Genetics and Genomics

Denatured DNA strands can, under cer-tain conditions, form double-stranded DNAwith other strands, provided that thestrands are sufficiently complementary insequence. This process of renaturation iscalled nucleic acid hybridization be-cause the double-stranded molecules are“hybrid” in that each strand comes from adifferent source. For DNA strands to hy-bridize, two requirements must be met:

1. The salt concentration must be high(� 0.25M) to neutralize the negative chargesof the phosphate groups, which wouldotherwise cause the complementary strandsto repel one another.

2. The temperature must be high enough todisrupt hydrogen bonds that form at randombetween short sequences of bases within thesame strand, but not so high that stable basepairs between the complementary strandsare disrupted.

The initial phase of renaturation is a slowprocess because the rate is limited by therandom chance that a region of two com-plementary strands will come together toform a short sequence of correct base pairs.This initial pairing step is followed by arapid pairing of the remaining complemen-tary bases and rewinding of the helix.Rewinding is accomplished in a matter ofseconds, and its rate is independent of DNAconcentration because the complementarystrands have already found each other.

The example of nucleic acid hybridiza-tion in Figure 2.16 will enable us to under-stand some of the molecular details and alsoto see how hybridization is used to “tag”and identify a particular DNA fragment.

Shown in part A is a solution of denaturedDNA, called the probe, in which each mol-ecule has been labeled with either radioac-tive atoms or light-emitting molecules.Probe DNA is typically obtained from aclone, and the labeled probe usually con-tains denatured forms of both strands pre-sent in the original duplex molecule. (Thishas led to some confusing terminology. Ge-neticists say that probe DNA hybridizeswith DNA fragments containing sequencesthat are similar to the probe, rather thancomplementary. What actually occurs is thatone strand of the probe undergoes hy-bridization with a complementary sequencein the fragment. But because the probe usu-ally contains both strands, hybridizationtakes place with any fragment that containsa similar sequence, each strand in the probeundergoing hybridization with the comple-mentary sequence in the fragment.)

Part B in Figure 2.16 is a diagram of ge-nomic DNA fragments that have been im-mobilized on a nitrocellulose filter. Whenthe probe is mixed with the genomic frag-ments (part C), random collisions bringshort, complementary stretches together. Ifthe region of complementary sequence isshort (part D), then random collision can-not initiate renaturation because the flank-ing sequences cannot pair; in this case theprobe falls off almost immediately. If, how-ever, a collision brings short sequences to-gether in the correct register (part E), thenthis initiates renaturation, because the pair-ing proceeds zipperlike from the initial con-tact. The main point is that DNA fragmentsare able to hybridize only if the length of

56 Chapter 2 DNA Structure and DNA Manipulation

Photocaptionphotocaptionphotocaptionphotocaptionphotocap-tionphotocaptionphotocaption-photocaptionphotocaptionphotocaptionphotocaption

09131_01_1746P

Page 58: Introduction to Molecular Genetics and Genomics

2.3 The Separation and Identification of Genomic DNA Fragments 57

Mix

Heat-sealed bag

Renaturation

The denatured probe usually contains both complementary strands.

Random collisions bring small regions of complementary sequences together to start the renaturation.

Some fragments in the genomic DNA may contain a sequence similar to that in the probe DNA.

Base pairing cannot go farther because flanking sequences are not complementary; probe falls away.

Base pairing proceeds in a zipper-like fashion because flanking sequences are complementary; probe sticks.

(A)–Fragments of denatured and labeled probe DNA

(D)–Initial pairing with incorrect fragment

(C)–

(B)–Fragments of denatured genomic DNA immobilized on filter

G TC A

A T A AT A T T

T G C G AA C G C T

G C CC G G

CCC G GC G G

C C A T A C A C A

CA A T G C G AT T A C G C T

T A TA T A

CGACGT C G

C A

T T A C A T G C TT A T T A C G C T

C A

C GG

G G A

(E)–Initial pairing with correct fragment

Figure 2.16 Nucleic acid hybridization. (A) Duplex molecules of probe DNA (obtained from a clone)are denatured and (B) placed in contact with a filter to which is attached denatured strands ofgenomic DNA. (C) Under the proper conditions of salt concentration and temperature, shortcomplementary stretches come together by random collision. (D) If the sequences flanking thepaired region are not complementary, then the pairing is unstable and the strands come apart again.(E) If the sequences flanking the paired region are complementary, then further base pairingstabilizes the renatured duplex.

the region in which they can pair is suffi-ciently long. Some mismatches in thepaired region can be tolerated. How manymismatches are allowed is determined bythe conditions of the experiment: Thelower the temperature at which the hy-bridization is carried out, and the higherthe salt concentration, the greater the pro-portion of mismatches that are tolerated.

The Southern BlotThe ability to renature DNA in the manneroutlined in Figure 12.16 means that solu-tion containing a small fragment of dena-

tured DNA, if it is suitably labeled (for ex-ample, with radioactive 32P), can be com-bined with a complex mixture of denaturedDNA fragments, and upon renaturation thesmall fragment will “tag” with radioactivityany molecules in the complex mixture withwhich it can hybridize. The radioactive tagallows these molecules to be identified.

The methods of DNA cleavage, elec-trophoresis, transfer to nitrocellulose, andhybridization with a probe are all combinedin the Southern blot, named after its in-ventor Edward Southern. In this procedure,a gel in which DNA molecules have beenseparated by electrophoresis is treated with

Page 59: Introduction to Molecular Genetics and Genomics

alkali to denature the DNA and render itsingle-stranded (Figure 2.17). Then the DNAis transferred to a sheet of nitrocellulose fil-ter in such a way that the relative positionsof the DNA fragments are maintained. Thetransfer is accomplished by overlaying thenitrocellulose onto the gel and stackingmany layers of absorbent paper on top; theabsorbent paper sucks water moleculesfrom the gel and through the nitrocellulose,to which the DNA fragments adhere. (Thisstep is the “blot” component of the South-ern blot, parts A and B.) Then the filter istreated so that the single-stranded DNA be-comes permanently bound. The treated fil-ter is mixed with a solution containingdenatured probe (DNA or RNA) under con-ditions that allow complementary strands tohybridize to form duplex molecules (partC). Radioactive or other label present in theprobe becomes stably bound to the filter,and therefore resistant to removal by wash-ing, only at positions at which base se-quences complementary to the probe arealready present on the filter, so that theprobe can form duplex molecules. The labelis located by placing the paper in contactwith x-ray film. After development of thefilm, blackened regions indicate positions ofbands containing the radioactive or light-emitting label (part D).

The procedure in Figure 2.17 solves thewet-haystack problem by transferring andimmobilizing the genomic DNA fragmentsto a filter and identifying, by hybridization,

the ones that are of interest. Practical appli-cations of Southern blotting center on iden-tifying DNA fragments that containsequences similar to the probe DNA orRNA, where the proportion of mismatchednucleotides allowed is determined by theconditions of hybridization. The advantagesof the Southern blot are convenience andsensitivity. The sensitivity comes from thefact that both hybridization with a labeledprobe and the use of photographic film am-plify the signal; under typical conditions, aband can be observed on the film with only5 � 10�12 grams of DNA—a thousandtimes less DNA than the amount requiredto produce a visible band in the gel itself.

2.4 Selective Replication of Genomic DNA Fragments

Although nucleic acid hybridization allowsa particular DNA fragment to be identifiedwhen present in a complex mixture of frag-ments, it does not enable the fragment to beseparated from the others and purified. Ob-taining the fragment in purified formrequires cloning, which is straightforwardbut time-consuming. (Cloning methods arediscussed in Chapter 13.) However, if thefragment of interest is not too long, and ifthe nucleotide sequence at each end isknown, then it becomes possible to obtainlarge quantities of the fragment merely by

58 Chapter 2 DNA Structure and DNA Manipulation

Au: Proofreaderasks that you con-firm reference toChapter 13 in firstpara of sec. 2.4.

(A)–DNA is cleaved; electrophoresisis used to separate DNA

(B)–DNA fragments are blotted onto nitrocellulose filter

(C)–Filter is exposed to radioactive probe

(D)–Filter is exposed tophotographic film;film is developed

DNA restriction fragments(so many thatindividual bands run together)

DNA fragments transferred onto filter (invisible at this stage)

Probe hybridizes with homologous DNA fragments (still not visible)

Bands on x-ray film created by labeled probe

x-rayfilm

Size markers

Gel

Buffer solution

Absorbentpaper

Gel

Gel

Nitrocellulosefilter

Nitrocellulosefilter

Filter

Heat-sealed bag

Probe insolution

Figure 2.17 Southernblot. (A) DNA re-striction fragments areseparated by electro-phoresis, blotted fromthe gel onto a nitro-cellulose or nylonfilter, and chemicallyattached by the use of ultraviolet light. (B) The strands aredenatured and mixedwith radioactive orlight-sensitive probeDNA, which bindswith complementarysequences present on the filter. Thebound probe remains,whereas unboundprobe washes off. (C) Bound probe isrevealed by darkeningof photographic filmplaced over the filter.The positions of thebands indicate whichrestriction fragmentscontain DNA se-quences homologousto those in the probe.

Page 60: Introduction to Molecular Genetics and Genomics

selective replication. This process is calledamplification. How would one know thesequence of the ends? Let us return to ourexample of the 3.0-kb BamHI fragment thatserves to mark a risk factor for breast cancerin certain pedigrees. Suppose that this frag-ment is cloned and sequenced from one af-fected individual, and it is found that,relative to the normal genomic sequence inthis region, the BamHI fragment is missing aregion of 500 base pairs. At this point the se-quences at the ends of the fragment areknown, and we can also infer that amplifi-cation of genomic DNA from individualswith the risk factor will yield a band of 3.0kb, whereas amplification from genomicDNA of noncarriers will yield a band of 3.5kb. This difference allows every person inthe pedigree to be diagnosed as a carrier ornoncarrier merely by means of DNA ampli-fication. To understand how amplificationworks, it is first necessary to examine a fewkey features of DNA replication.

Constraints on DNA Replication: Primers and 5'-to-3' Strand ElongationAs with most metabolic reactions in livingcells, nucleic acids are synthesized in chem-ical reactions controlled by enzymes. Anenzyme that forms the sugar–phosphatebond (the phosphodiester bond) betweenadjacent nucleotides in a nucleic acid chainis called a DNA polymerase. A variety ofDNA polymerases have been purified, andfor amplification of a DNA fragment, theDNA synthesis is carried out in vitro by com-bining purified cellular components in atest tube under precisely defined condi-tions. (In vitro means “without participationof living cells.”)

In order for DNA polymerase to catalyzesynthesis of a new DNA strand, preexistingsingle-stranded DNA must be present. Eachsingle-stranded DNA molecule present inthe reaction mix can serve as a templateupon which a new partner strand is createdby the DNA polymerase. For DNA replica-tion to take place, the 5'-triphosphates ofthe four deoxynucleosides must also be pre-sent. This requirement is rather obvious, be-cause the nucleoside triphosphates are theprecursors from which new DNA strandsare created. The triphosphates needed arethe compounds denoted in Table 2.1 as

dATP, dGTP, dTTP, and dCTP, which containthe bases adenine, guanine, thymine, andcytosine, respectively. Details of the struc-tures of dCTP and dGTP are shown in Figure2.18, in which the phosphate groups cleavedoff during DNA synthesis are indicated.DNA synthesis requires all four nucleoside5'-triphosphates and does not take place ifany of them is omitted.

A feature found in all DNA polymerasesis that

A DNA polymerase can only elongate a DNAstrand. It is not possible for DNA polymerase toinitiate synthesis of a new strand, even when atemplate molecule is present.

One important implication of this prin-ciple is that DNA synthesis requires a pre-existing segment of nucleic acid that ishydrogen-bonded to the template strand.This segment is called a primer. Becausethe primer molecule can be very short, it isan oligonucleotide, which literally means“few nucleotides.” As we shall see in Chap-ter 6, in living cells the primer is a short seg-ment of RNA, but in DNA amplification invitro, the primer employed is usually DNA.

2.4 Selective Replication of Genomic DNA Fragments 59

Deoxycytidine 5'-triphosphate (dCTP)

Deoxyguanosine 5'-triphosphate (dGTP)

The outer two phosphategroups are cleaved off whennucleotides are added to the growing DNA strand.

P

O

O–

OOP

O

O–

OP

O

O–

– O

OH

CH2 O

OH H

H H

H

N HH

H

P

O

O–

OOP

O

O–

OP

O

O–

OH

CH2 OO

H HH H

H

N

N

N

CCC

C

C

G

H

N H

N H

H

C

C

C

C N

N

H C

– O

Figure 2.18 Two deoxynucleoside triphosphates used in DNA synthesis.The outer two phosphate groups are removed during synthesis.

Page 61: Introduction to Molecular Genetics and Genomics

It is the 3' end of the primer that is essen-tial, because, as emphasized in Chapter 1,

DNA synthesis proceeds only by addition ofsuccessive nucleotides to the 3' end of thegrowing strand. In other words, chain elonga-tion always takes place in the 5'-to-3' direction(5' � 3').

The reason for the 5' � 3' direction ofchain elongation is illustrated in Figure 2.19.It is a consequence of the fact that the reac-tion catalyzed by DNA polymerase is theformation of a phosphodiester bond be-tween the free 3'-OH group of the chainbeing extended and the innermost phos-phorus atom of the nucleoside triphosphatebeing incorporated at the 3' end. Recogni-tion of the appropriate incoming nucleosidetriphosphate in replication depends on basepairing with the opposite nucleotide in thetemplate strand. DNA polymerase will usu-ally catalyze the polymerization reactionthat incorporates the new nucleotide at theprimer terminus only when the correctbase pair is present. The same DNA poly-merase is used to add each of the fourdeoxynucleoside phosphates to the 3'-OHterminus of the growing strand.

The Polymerase Chain ReactionThe requirement for an oligonucleotideprimer, and the constraint that chain elon-gation must always occur in the 5' � 3'direction, make it possible to obtain largequantities of a particular DNA sequence byselective amplification in vitro. The methodfor selective amplification is called the poly-merase chain reaction (PCR). For its in-vention, Californian Kary B. Mullis wasawarded a Nobel Prize in 1993. PCR amplifi-cation uses DNA polymerase and a pair ofshort, synthetic oligonucleotide primers,usually 18–22 nucleotides in length, that arecomplementary in sequence to the ends ofthe DNA sequence to be amplified. Figure2.20 gives an example in which the primeroligonucleotides (green) are 9-mers. Theseare too short for most practical purposes, butthey will serve for illustration. The originalduplex molecule (part A) is shown in blue.This duplex is mixed with a vast excess ofprimer molecules, DNA polymerase, and allfour nucleoside triphosphates. When thetemperature is raised, the strands of theduplex denature and become separated.

60 Chapter 2 DNA Structure and DNA Manipulation

4

Base pairing specifies the next nucleotide to be added at the 3’ end.

Newlysynthesizedstrand

Templatestrand

G

3’end

5’end

5’end

3’end

C

T

O

OH

P

P

3’

3’

C PP

P

P P

4

2

O

3

OH

1

A P–P (pyrophosphate)group is released.

The 3’ hydroxyl group at the3’ end of the growing strand attacks the innermost phosphate group of the incoming trinucleotide.

An O–P bond is formed to attach the new nucleotide.

Figure 2.19 Addition of nucleotides to the 3'-OH terminus of a growing strand. The recognition stepis shown as the formation of hydrogen bonds between the A and the T. The chemical reaction is thatthe 3'-OH group of the 3' end of the growing chain attacks the innermost phosphate group of theincoming trinucleotide.

Page 62: Introduction to Molecular Genetics and Genomics

(A)–Original duplex DNA

(B)–First cycle—primers attached

(C)–Elongation (DNA synthesis)

(D)–Second cycle—after elongation

(E)–Third cycle—after elongation

TA

A CT G

AT

TA

G A C G

TA C A T G A C G

C T G C

DNA

Region to beamplified PrimerPrimer

TA

A CT G

AT

TA

G A C GC T G C

TA

A CT G

AT

TA

G A C GC T G C

TA

A CT G

AT

TA

G A C GC T G C

TA

A CT G

AT

TA

G A C GC T G C

TA

A CT G

AT

TA

G A C GC T G C

C T AG A T

T G C AA C G T

T G

C T A T G C A T G

A C

C T AG A T

T G C AA C G T

T GA C

G A T A C G T A C

C T AG A T

T G C AA C G T

T GA C

C T AG A T

T G C AA C G T

T GA C

C T AG A T

T G C AA C G T

T GA C

G A T A C G T A C

Figure 2.20 Role ofprimer sequences inPCR amplification. (A)Target DNA duplex(blue), showingsequences chosen asthe primer-bindingsites flanking theregion to be amplified.(B) Primer (green)bound to denaturedstrands of target DNA.(C) First round ofamplification. Newlysynthesized DNA isshown in pink. Notethat each primer isextended beyond theother primer site. (D)Second round ofamplification (onlyone strand shown); inthis round, the newlysynthesized strandterminates at theopposite primer site.(E) Third round ofamplification (onlyone strand shown); inthis round, bothstrands are truncatedat the primer sites.Primer sequences arenormally about twiceas long as shown here.

When the temperature is lowered again toallow renaturation, the primers, becausethey are in great excess, become annealed tothe separated template strands (part B).Note that the primer sequences are differentfrom each other but complementary to se-quences present in opposite strands of theoriginal DNA duplex and flanking the re-gion to be amplified. The primers are ori-ented with their 3' ends pointing in thedirection of the region to be amplified, be-

cause each DNA strand elongates only at the3' end. After the primers have annealed,each is elongated by DNA polymerase usingthe original strand as a template, and thenewly synthesized DNA strands (red) growtoward each other as synthesis proceeds(part C). Note that:

A region of duplex DNA present in the original reaction mix can be PCR-amplifiedonly if the region is flanked by the primeroligonucleotides.

2.4 Selective Replication of Genomic DNA Fragments 61

Page 63: Introduction to Molecular Genetics and Genomics

To start a second cycle of PCR amplifica-tion, the temperature is raised again to de-nature the duplex DNA. Upon lowering ofthe temperature, the original parentalstrands anneal with the primers and arereplicated as shown in Figure 2.20B and C.The daughter strands produced in the firstround of amplification also anneal withprimers and are replicated, as shown in partD. In this case, although the daughter du-plex molecules are identical in sequence tothe original parental molecule, they consistentirely of primer oligonucleotides andnonparental DNA that was synthesized ineither the first or the second cycle of PCR.As successive cycles of denaturation, primerannealing, and elongation occur, the origi-nal parental strands are diluted out by theproliferation of new daughter strands untileventually, virtually every molecule pro-duced in the PCR has the structure shownin part D.

The power of PCR amplification is thatthe number of copies of the template strandincreases in exponential progression: 1, 2,4, 8, 16, 32, 64, 128, 256, 512, 1024, and soforth, doubling with each cycle of replica-tion. Starting with a mixture containing aslittle as one molecule of the fragment of in-terest, repeated rounds of DNA replicationincrease the number of amplified moleculesexponentially. For example, starting with asingle molecule, 25 rounds of DNA replica-tion will result in 225 � 3.4 � 107 mole-cules. This number of molecules of theamplified fragment is so much greater thanthat of the other unamplified molecules inthe original mixture that the amplifiedDNA can often be used without further pu-rification. For example, a single fragment of3 kb in E. coli accounts for only 0.06 percentof the DNA in this organism. However, ifthis single fragment were replicatedthrough 25 rounds of replication, then99.995 percent of the resulting mixturewould consist of the amplified sequence. A3-kb fragment of human DNA constitutesonly 0.0001 percent of the total genomesize. Amplification of a 3-kb fragment ofhuman DNA to 99.995 percent puritywould require about 34 cycles of PCR.

An overview of the polymerase chainreaction is shown in Figure 2.21. The DNA se-quence to be amplified is again shown inblue and the oligonucleotide primers ingreen. The oligonucleotides anneal to the

ends of the sequence to be amplified and be-come the substrates for chain elongation byDNA polymerase. In the first cycle of PCRamplification, the DNA is denatured to sep-arate the strands. The denaturation temper-ature is usually around 95°C. Then thetemperature is decreased to allow annealingin the presence of a vast excess of the primeroligonucleotides. The annealing tempera-ture is typically in the range of 50°C–60°C,depending largely on the G � C content ofthe oligonucleotide primers. To completethe cycle, the temperature is raised slightly,to about 70°C, for the elongation of eachprimer. The steps of denaturation, renatura-tion, and replication are repeated from20–30 times, and in each cycle the numberof molecules of the amplified sequence isdoubled.

Implementation of PCR with conven-tional DNA polymerases is not practical,because at the high temperature necessaryfor denaturation, the polymerase is itself ir-reversibly unfolded (denatured) and be-comes inactive. However, DNA polymeraseisolated from certain organisms is heat sta-ble because the organisms normally live inhot springs at temperatures well above90°C, such as are found in YellowstoneNational Park. Such organisms are said tobe thermophiles. The most widely usedheat-stable DNA polymerase is called Taqpolymerase, because it was originally iso-lated from the thermophilic bacteriumThermus aquaticus.

PCR amplification is very useful for gen-erating large quantities of a specific DNA se-quence. The principal limitation of thetechnique is that the DNA sequences at theends of the region to be amplified must beknown so that primer oligonucleotides canbe synthesized. In addition, sequenceslonger than about 5000 base pairs cannot bereplicated efficiently by conventional PCRprocedures, although there are modifica-tions of PCR that allow longer fragments tobe amplified. On the other hand, many ap-plications require amplification of relativelysmall fragments. The major advantage ofPCR amplification is that it requires onlytrace amounts of template DNA. Theoreti-cally only one template molecule is re-quired, but in practice the amplification of asingle molecule may fail because the mole-cule may, by chance, be broken or damaged.But amplification is usually reliable with as

62 Chapter 2 DNA Structure and DNA Manipulation

Page 64: Introduction to Molecular Genetics and Genomics

Primer oligonucleotides

Denaturation, annealing

DNA sequence to be amplified

DNA replication

(A) First cycle

(B) Second cycle

(C) 20–30 cycles Amplified DNAsequences

Figure 2.21 Polymerase chain reaction (PCR) for amplification of particular DNA sequences. Onlythe region to be amplified is shown. Oligonucleotide primers (green) that are complementary to theends of the target sequence (blue) are used in repeated rounds of denaturation, annealing, and DNAreplication. Newly replicated DNA is shown in pink. The number of copies of the target sequencedoubles in each round of replication, eventually overwhelming any other sequences that may bepresent.

2.4 Selective Replication of Genomic DNA Fragments 63

Page 65: Introduction to Molecular Genetics and Genomics

few as 10–100 template molecules, whichmakes PCR amplification 10,000–100,000times more sensitive than detection via nu-cleic acid hybridization.

The exquisite sensitivity of PCR amplifi-cation has led to its use in DNA typing forcriminal cases in which a minusculeamount of biological material has been leftbehind by the perpetrator (skin cells on acigarette butt or hair-root cells on a singlehair can yield enough template DNA foramplification). In research, PCR is widelyused in the study of independent muta-tions in a gene whose sequence is knownin order to identify the molecular basis ofeach mutation, to study DNA sequencevariation among alternative forms of agene that may be present in natural popu-lations, or to examine differences amonggenes with the same function in differentspecies. The PCR procedure has also comeinto widespread use in clinical laboratoriesfor diagnosis. To take just one very impor-tant example, the presence of the humanimmunodeficiency virus (HIV), whichcauses acquired immune deficiency syn-drome (AIDS), can be detected in tracequantities in blood banks via PCR by usingprimers complementary to sequences inthe viral genetic material. These and otherapplications of PCR are facilitated by thefact that the procedure lends itself to au-tomation by the use of mechanical robotsto set up and run the reactions.

2.5 The Terminology of Genetic Analysis

In order to discuss the types of DNA mark-ers that modern geneticists commonly usein genetic analysis, we must first introducesome key terms that provide the essentialvocabulary of genetics. These terms can beunderstood with reference to Figure 2.22. InChapter 1 we defined a gene as an ele-ment of heredity, transmitted from parentsto offspring in reproduction , that influ-ences one or more hereditary traits. Chem-ically, a gene is a sequence of nucleotidesalong a DNA molecule. In a population oforganisms, not all copies of a gene mayhave exactly the same nucleotide se-quence. For example, whereas one form ofa gene has the codon GCA, which specifies

alanine in the polypeptide chain that thegene encodes, another form of the samegene may have, at the same position, thecodon GCG, which also specifies alanine.Hence the two forms of the gene encodethe same sequence of amino acids yet dif-fer in DNA sequence. The alternative formsof a gene are called alleles of the gene.Different alleles may also code for differentamino acid sequences, sometimes withdrastic effects. Recall the example of thePAH gene for phenyalanine hydroxylase inChapter 1, in which a change in codon 408from CGG (arginine) to TGG (tryptophan)results in an inactive enzyme that becomesexpressed as the inborn error of metabo-lism phenylketonuria.

Within a cell, genes are arranged in lin-ear order along microscopic thread-likebodies called chromosomes, which wewill examine in detail in Chapters 4 and 8.Each human reproductive cell contains onecomplete set of 23 chromosomes contain-ing 3 � 109 base pairs of DNA. A typicalchromosome contains several hundred toseveral thousand genes. In humans theaverage is approximately 3500 genes perchromosome. Each chromosome contains asingle molecule of duplex DNA along itslength, complexed with proteins and verytightly coiled. The DNA in the average hu-man chromosome, when fully extended,has relative dimensions comparable tothose of a wet spaghetti noodle 25 mileslong; when the DNA is coiled in the form ofa chromosome, its physical compaction iscomparable to that of the same noodlecoiled and packed into an 18-foot canoe.

The physical position of a gene along achromosome is called the locus of thegene. In most higher organisms, includinghuman beings, each cell other than a spermor egg contains two copies of each type ofchromosome—one from the mother andone inherited from the father. Each mem-ber of such a pair of chromosomes is said tobe homologous to the other. (The chro-mosomes that determine sex are an impor-tant exception, discussed in Chapter 4, thatwe will ignore for now.) At any locus,therefore, each individual carries two al-leles, because one allele is present at a cor-responding position in each of thehomologous maternal and paternal chro-mosomes (Figure 2.22).

64 Chapter 2 DNA Structure and DNA Manipulation

Page 66: Introduction to Molecular Genetics and Genomics

The genetic constitution of an individualis called its genotype. For a particulargene, if the two alleles at the locus in an in-dividual are indistinguishable from eachother, then for this gene the genotype of theindividual is said to be homozygous forthe allele that is present. If the two alleles atthe locus are different from each other,then for this gene the genotype of the indi-vidual is said to be heterozygous for thealleles that are present. Typographically,genes are indicated in italics, and alleles aretypically distinguished by uppercase or low-ercase letters (A versus a), subscripts (A1versus A2), superscripts (a� versus a�), orsometimes just � and �. Using these sym-bols, homozygous genes would be por-trayed by any of these formulas: AA, aa,A1A1, A2A2, a�a�, a�a�, ���, or ���. As inthe last two examples, the slash is some-times used to separate alleles present inhomologous chromosomes to avoid ambi-guity. Heterozygous genes would be por-trayed by any of the formulas Aa, A1A2,a�a�, or ���. In Figure 2.22, the genotypeBb is heterozygous because the B and b al-leles are distinguishable (which is why theyare assigned different symbols), whereasthe genotype CC is homozygous. Thesegenotypes could also be written as B�b andC�C, respectively.

Whereas the alleles that are present inan individual constitute its genotype, thephysical or biochemical expression of thegenotype is called the phenotype. To putit as simply as possible, the distinction isthat the genotype of an individual is whatis on the inside (the alleles in the DNA),whereas the phenotype is what is on theoutside (the observable traits, includingbiochemical traits, behavioral traits, and soforth). The distinction between genotypeand phenotype is critically important be-cause there usually is not a one-to-one cor-respondence between genes and traits.Most complex traits, such as hair color, skincolor, height, weight, behavior, life span,and reproductive fitness, are influenced bymany genes. Most traits are also influencedmore or less strongly by environment. Thismeans that the same genotype can result indifferent phenotypes, depending on the en-vironment. Compare, for example, twopeople with a genetic risk for lung cancer; ifone smokes and the other does not, the

smoker is much more likely to develop thedisease. Environmental effects also implythat the same phenotype can result frommore than one genotype; smoking againprovides an example, because most smok-ers who are not genetically at risk can alsodevelop lung cancer.

2.6 Types of DNA Markers Present in Genomic DNA

Genetic variation, in the form of multiplealleles of many genes, exists in most naturalpopulations of organisms. We have calledsuch genetic differences between individu-als DNA markers; they are also called DNApolymorphisms. (The term polymorphismliterally means “multiple forms.”) Themethods of DNA manipulation examined inSections 2.3 and 2.4 can be used in a varietyof combinations to detect differences amongindividuals. Anyone who reads the litera-ture in modern genetics will encounter abewildering variety of acronyms referring todifferent ways in which genetic polymor-phisms are detected. The different ap-proaches are in use because no single

2.6 Types of DNA Markers Present in Genomic DNA 65

Gene A

b

Gene B

Heterozygousgenotype Bb

C

Gene C

Homozygousgenotype CC

Homologouschromosomes

B C

Locus (physical position) of gene A in each homologous chromosome

One of each pair of chromosomes is maternal in origin, the other paternal.

Genotypes are sometimes written with a slash (for example, B/b and C/C) to distinguish the alleles in homologous chromosomes.

A1A2A3A4A5•••

Many different A alleles can existin an entire population of organisms, but only a single allele can be present at the locus of the A gene in any one chromosome.

Figure 2.22 Key concepts and terms used inmodern genetics. Note that a single gene canhave any number of alleles in the population asa whole, but no more than two alleles can bepresent in any one individual.

Page 67: Introduction to Molecular Genetics and Genomics

method is ideal for all applications, eachmethod has its own advantages and limita-tions, and new methods are continually be-ing developed. In this section we examinesome of the principal methods for detectingDNA polymorphisms among individuals.

Single-Nucleotide Polymorphisms (SNPs)A single-nucleotide polymorphism, orSNP (pronounced “snip”), is present at aparticular nucleotide site if the DNA mole-cules in the population frequently differ inthe identity of the nucleotide pair that oc-cupies the site. For example, some DNAmolecules may have a T�A base pair at aparticular nucleotide site, whereas otherDNA molecules in the same populationmay have a C�G base pair at the same site.This difference constitutes a SNP. The SNPdefines two “alleles” for which there couldbe three genotypes among individuals inthe population: homozygous with T�A atthe corresponding site in both homologouschromosomes, homozygous with C�G atthe corresponding site in both homologouschromosomes, or heterozygous with T�Ain one chromosome and C�G in the ho-mologous chromosome. The word allele isin quotation marks above because the SNPneed not be in a coding sequence, or evenin a gene. In the human genome, any tworandomly chosen DNA molecules are likely

to differ at one SNP site about every1000–3000 bp in protein-coding DNA andat about one SNP site every 500–1000 bp innoncoding DNA. Note, in the definition of aSNP, the stipulation that DNA moleculesmust differ at the nucleotide site “fre-quently.” This provision excludes rare ge-netic variation of the sort found in less than1 percent of the DNA molecules in a popu-lation. The reason for the exclusion is thatgenetic variants that are too rare are notgenerally as useful in genetic analysis as themore common variants. A catalog of SNPs isregarded as the ultimate compendium ofDNA markers, because SNPs are the mostcommon form of genetic differences amongpeople and because they are distributedapproximately uniformly along the chro-mosomes. By the middle of 2001, some300,000 SNPs are expected to have beenidentified in human populations and theirpositions in the chromosomes located.

Restriction Fragment LengthPolymorphisms (RFLPs)Although most SNPs require DNA sequenc-ing to be studied, those that happen to belocated within a restriction site can be ana-lyzed with a Southern blot. An example ofthis situation is shown in Figure 2.23, wherethe SNP consists of a T�A nucleotide pair insome molecules and a C�G pair in others. Inthis example, the polymorphic nucleotide

66 Chapter 2 DNA Structure and DNA Manipulation

Polymorphic site

3'5'5'3'

Treatment of DNAwith EcoRI

G(A) (B)

A A T T CC T T A A G

G A A T T CC T T A A G

G A A T T CC T T A A G

Cleavage

3'5'5'3'

Treatment of DNAwith EcoRI

Polymorphic siteG A A T T C

C T T A A GG A A T T CC T T A A G

G A A C T CC T T G A G

No cleavage

Result: Two fragments Result: One larger fragment

Figure 2.23 A minor difference in the DNA sequence of two molecules can be detected if the differ-ence eliminates a restriction site. (A) This molecule contains three restriction sites for EcoRI, includ-ing one at each end. It is cleaved into two fragments by the enzyme. (B) This molecule has an alteredEcoRI site in the middle, in which 5'-GAATTC-3' becomes 5'-GAACTC-3'. The altered site cannot becleaved by EcoRI, so treatment of this molecule with EcoRI results in one larger fragment.

Page 68: Introduction to Molecular Genetics and Genomics

2.6 Types of DNA Markers Present in Genomic DNA 67

Figure 2.24 In a restric-tion fragment lengthpolymorphism (RFLP),alleles may differ inthe presence orabsence of a cleavagesite in the DNA. In thisexample, the a allelelacks a restriction sitethat is present in theDNA of the A allele.The difference infragment length canbe detected bySouthern blotting.RFLP alleles arecodominant, whichmeans (as shown atthe bottom) that DNAfrom the heterozygousAa genotype yieldseach of the singlebands observed inDNA from homozy-gous AA and aagenotypes.

3’5’5’3’

Positions ofcleavage sites

Duplex DNA

Duplex DNA

“Allele” a

Homozygous AA

Homozygous Aa

Homozygous aa

“Allele” A

Site of hybridizationwith probe DNA

3’5’5’3’

DNA bandfrom allele A

DNA bandfrom allele a

DNA bandfrom genotype AA

DNA bandsfrom genotype Aa

DNA bandfrom genotype aa

Direction of current

Smaller DNA fragments

Larger DNA fragments

Duplex DNA in homologouschromosomes

3’5’5’3’

3’5’5’3’

3’5’5’3’

3’5’5’3’

3’5’5’3’

3’5’5’3’

site is included in a cleavage site for the re-striction enzyme EcoRI (5'-GAATTC-3'). Thetwo nearest flanking EcoRI sites are alsoshown. In this kind of situation, DNA mole-cules with T�A at the SNP will be cleaved atboth flanking sites and also at the middlesite, yielding two EcoRI restriction frag-ments. Alternatively, DNA molecules withC�G at the SNP will be cleaved at bothflanking sites but not at the middle site(because the presence of C�G destroys theEcoRI restriction site) and so will yield onlyone larger restriction fragment. A SNP thateliminates a restriction site is known as arestriction fragment length polymor-

phism, or RFLP (pronounced either as“riflip” or by spelling it out).

Because RFLPs change the number andsize of DNA fragments produced bydigestion with a restriction enzyme, theycan be detected by the Southern blottingprocedure discussed in Section 2.3. An ex-ample appears in Figure 2.24. In this case thelabeled probe DNA hybridizes near the re-striction site at the far left and identifies theposition of this restriction fragment in theelectrophoresis gel. The duplex moleculelabeled “allele A” has a restriction site in themiddle and, when cleaved and subjected toelectrophoresis, yields a small band that

Page 69: Introduction to Molecular Genetics and Genomics

contains sequences homologous to theprobe DNA. The duplex molecule labeled“allele a” lacks the middle restriction siteand yields a larger band. In this situationthere can be three genotypes—AA, Aa, oraa, depending on which alleles are presentin the homologous chromosomes—and allthree genotypes can be distinguished asshown in the Figure 2.24. Homozygous AAyields only a small fragment, homozygousaa yields only a large fragment, and het-erozygous Aa yields both a small and a largefragment. Because the presence of both theA and a alleles can be detected in heterozy-gous Aa genotypes, A and a are said to be

codominant. In Figure 2.24, the bandsfrom AA and aa have been shown as some-what thicker than those from Aa, becauseeach AA genotype has two copies of the Aallele and each aa genotype has two copiesof the a allele, compared with only onecopy of each allele in the heterozygousgenotype Aa.

Random Amplified Polymorphic DNA (RAPD)For studying DNA markers, one limitationof Southern blotting is that it requires mate-rial (probes available in the form of cloned

68 Chapter 2 DNA Structure and DNA Manipulation

Origin of the Human Genetic Linkage Map

David Botstein,1 Raymond L. White,2

Mark Skolnick,3 and Ronald W. Davis4 19801Massachusetts Institute of Technology,Cambridge, Massachusetts2University of Massachusetts MedicalCenter, Worcester, Massachusetts3University of Utah, Salt Lake City, Utah4Stanford University, Stanford,CaliforniaConstruction of a Genetic Linkage Map inMan Using Restriction Fragment LengthPolymorphisms

This historic paper stimulated a major in-ternational effort to establish a geneticlinkage map of the human genome basedon DNA polymorphisms. Pedigree studiesusing these genetic markers soon led to thechromosomal localization and identifica-tion of mutant genes for hundreds of hu-man diseases. A more ambitious goal, stillonly partly achieved, is to understand thegenetic and environmental interactions in-volved in complex traits such as heart dis-ease and cancer. The "small set of largepedigrees" called for in the excerpt wassoon established by the Centre d'Etude duPolymorphisme Humain (CEPH) in Paris,France, and made available to investiga-

tors worldwide for genetic linkage studies.Today the CEPH maintains a database onthe individuals in these pedigrees that com-prises approximately 12,000 polymorphicDNA markers and 2.5 million genotypes.

No method of systematically mappinghuman genes has been de-vised, largely because of thepaucity of highly polymor-phic marker loci. The adventof recombinant DNA tech-nology has suggested a theo-retically possible way todefine an arbitrarily largenumber of arbitrarily poly-morphic marker loci. . . . Asubset of such polymorphisms canreadily be detected as differences in thelength of DNA fragments after digestionwith DNA sequence-specific restrictionendonucleases. These restriction frag-ment length polymorphisms (RFLPs)can be easily assayed in individuals, fa-cilitating large population studies. . . .[Genetic mapping] of many DNAmarker loci should allow the es-tablishment of a set of well-spaced,highly polymorphic genetic markerscovering the entire human genome [andenabling] any trait caused wholly or

partially by a major locus segregating ina pedigree to be mapped. Such a pro-cedure would not require any knowl-edge of the biochemical nature of thetrait or of the nature of the alterations inthe DNA responsible for the trait. . . .The most efficient procedure will be to

study a small set oflarge pedigrees whichhave been genotypedfor all known polymor-phic markers. . . . Theresolution of geneticand environmentalcomponents of dis-ease . . . must involveunraveling the underly-

ing genetic predisposition, understand-ing the environmental contributions,and understanding the variability of ex-pression of the phenotype. In principle,linked marker loci can allow one toestablish, with high certainty, the geno-type of an individual and, consequently,assess much more precisely the con-tribution of modifying factors such assecondary genes, likelihood of expres-sion of the phenotype, and environment.

Source: American Journal of Human Genetics32: 314–331

In principle, linkedmarker loci can

allow one toestablish, with high

certainty, thegenotype of an

individual.

Page 70: Introduction to Molecular Genetics and Genomics

DNA) and one limitation of PCR is that it re-quires sequence information (so primeroligonucleotides can be synthesized). Theseare not severe handicaps for organisms thatare well studied (for example, human be-ings, domesticated animals and cultivatedplants, and model genetic organisms such asyeast, fruit fly, nematode, or mouse), be-cause research materials and sequence in-formation are readily available. But for thevast majority of organisms that biologistsstudy, there are neither research materialsnor sequence information. Genetic analysiscan still be carried out in these organisms byusing an approach called random ampli-fied polymorphic DNA or RAPD (pro-nounced “rapid”), described in this section.

RAPD analysis makes use of a set of PCRprimers of 8–10 nucleotides whose se-quence is essentially random. The randomprimers are tried individually or in pairs inPCR reactions to amplify fragments ofgenomic DNA from the organism of inter-est. Because the primers are so short, they

often anneal to genomic DNA at multiplesites. Some primers anneal in the properorientation and at a suitable distance fromeach other to support amplification of theunknown sequence between them. Amongthe set of amplified fragments are ones thatcan be amplified from some genomic DNAsamples but not from others, which meansthat the presence or absence of the ampli-fied fragment is polymorphic in the popula-tion of organisms.

In most organisms it is usually straight-forward to identify a large number ofRAPDs that can serve as genetic markers formany different kinds of genetic studies. Anexample of RAPD gel analysis is illustratedin Figure 2.25, where three pairs of primers(sets 1–3) are used to amplify genomic DNAfrom four individuals in a population. Thefragments that amplify are then separatedon an electrophoresis gel and visualized af-ter straining with ethidium bromide. Manyamplified bands are typically observed foreach primer set, but only some of these are

2.6 Types of DNA Markers Present in Genomic DNA 69

Individuals

RAPD primer sets

A B C D A B C D A B C D

Set 2 Set 3Set 1

RAPD bands onelectrophoresisgel

Monomorphicbands

Polymorphicbands

Fragments that PCR-amplify with genomic DNA from some individuals but not others, using the same primer set

Figure 2.25 Randomamplified polymorphicDNA (RAPD) isdetected through theuse of relatively shortprimer sequences that,by chance, matchgenomic DNA atmultiple sites that areclose enough togetherto support PCRamplification.Genomic DNA from asingle individualtypically yields manybands, only some ofwhich are polymor-phic in the population.Different sets ofprimers amplify differ-ent fragments ofgenomic DNA.

Page 71: Introduction to Molecular Genetics and Genomics

polymorphic. These are indicated in Figure2.25 by the colored dots. The amplifiedbands that are not polymorphic are said tobe monomorphic in the sample, whichmeans that they are the same from one in-dividual to the next. This example shows 17RAPD polymorphisms. Figure 2.26 shows anactual RAPD gel amplified from genomicDNA obtained from small tissue samplesfrom a population of fish (Campostomaanomalum) in the Great Miami River Basin,Ohio. The fish were collected as part of awater quality assessment program to deter-mine whether fish populations in stressfulwater environments progressively lose theirgenetic variation (that is, become increas-ingly monomorphic). Each pair of samplesis flanked by a lane containing DNA sizestandards producing a “ladder” of fragmentsat 100-bp increments.

Figures 2.24 through 2.26 illustrate animportant point:

In modern genetics, the phenotypes that arestudied are very often bands in a gel ratherthan physical or physiological characteristics.

Figure 2.25 offers a good example. Eachposition at which a band is observed in oneor more samples is a phenotype, whetheror not the band is polymorphic. For exam-ple, primer set 1 yields a total of 19 bands,of which 5 are polymorphic and 14 aremonomorphic in the sample. The pheno-types could be named in any convenientway, such as by indicating the primer setand the fragment length. For example,suppose that the smallest amplified frag-ment for primer set 1 is 125 bp, which isthe polymorphic fragment at the bottomleft in Figure 2.25. We could name thisfragment unambiguously as 1-125 becauseit is a fragment of 125 bp amplified byprimer set 1.

To understand why a DNA band is a phe-notype, rather than a gene or a genotype, itis useful to assign different names to the“alleles” that do or do not support amplifi-cation. (The word allele is in quotationmarks again, because the 1-125 fragmentthat is amplified need not be part of a gene.)We are talking only about the 1-125 frag-ment, so we could call the allele capable ofsupporting amplification the plus (�) alleleand the allele not capable of supporting am-plification the minus (�) allele. Then thereare three possible genotypes with regard tothe amplified fragment: ���, ���, and���. Using genomic DNA from these geno-types, the homozygous ��� and hetero-zygous ��� will both support amplificationof the 1-125 fragment, whereas the ���genotype will not support amplification.Hence the presence of the 1-125 fragmentis the phenotype observed in both ��� and��� genotypes. In other words, withregard to amplification, the � allele isdominant to the � allele, because thephenotype (presence of the 1-125 band) ispresent in both homozygous ��� and het-erozygous ��� genotypes. Therefore, onthe basis of the phenotype for the 1-125band in Figure 2.25, we could say that indi-viduals A and D could have either a ��� or��� genotype but that individuals B and Cmust have genotype ���.

70 Chapter 2 DNA Structure and DNA Manipulation

900 bp

400 bp

Au: Are 400 and900 base pairs cor-rect? Or is it 100and 400 basepairs?

Figure 2.26 RAPD polymorphisms in thestoneroller fish (Campostoma anomalum) trappedin tributaries of the Great Miami River in Ohio.Each pair of samples is flanked by a lanecontaining DNA size standards; in these lanes,the smallest DNA fragment is 100 base pairs(bp), and each successively larger fragmentincreases in size by 100 bp. Fragments whosesizes are multiples of 500 bp are present ingreater concentration and so yield darker bands.[Courtesy of Michael Simonich, Manju Garg,and Ana Braam (Pathology AssociatesInternational, Cincinnati, Ohio).]

Page 72: Introduction to Molecular Genetics and Genomics

Amplified Fragment LengthPolymorphisms (AFLPs)Because RAPD primers are small and maynot match the template DNA perfectly, theamplified DNA bands often differ a greatdeal in how dark or light they appear. Thisvariation creates a potential problem, be-cause some exceptionally dark bands mayactually result from two amplified DNAfragments of the same size, and someexceptionally light bands may be difficult tovisualize consistently. To obtain amplifiedfragments that yield more uniform band in-tensities, double-stranded oligonucleotidesequences that match the primer sequencesperfectly can be attached to genomicrestriction fragments enzymatically prior toamplification. This method, which is out-lined in Figure 2.27, yields a class of DNA

polymorphisms known as amplified frag-ment length polymorphisms, or AFLPs(usually pronounced by spelling it out).The first step (part A) is to digest genomicDNA with a restriction enzyme; this exam-ple uses the enzyme EcoRI, whose restric-tion site is 5'-GAATTC-3'. Digestion yields alarge number of restriction fragmentsflanked by what remains of an EcoRI site oneach side. In the next step (part B), double-stranded oligonucleotides called primeradapters, with single-stranded overhangscomplementary to those on the restrictionfragments, are ligated onto the restrictionfragments using the enzyme DNA ligase.The resulting fragments (C) are ready foramplification by means of PCR. Note thatthe same adapter is ligated onto each end,so a single primer sequence will anneal toboth ends and support amplification.

2.6 Types of DNA Markers Present in Genomic DNA 71

Nucleotide extensions reduce the number of amplifications.

Cleavage

EcoRI site

Primer adapter

Primer sequence

Fractionof fragments

amplified

Primer adapter

Adapter ligation

Restriction fragment

Amplification

5 ’ – G A A T T C3 ’ – C T T A A G

EcoRI site

G A A T T C – 3 ’C T T A A G – 5 ’

5’–AATTNN…NN–3’3’–NN…NN–5’

5’–NN…NN–3’3’–NN…NNTTAA–5’ 3’–G

5’–AATTC

3’–NN…NNTTAAG5’–NN…NNAATTC

3’–CTTAANN…NN–5’

3’–ACTTAANN…NN–5’

3’–CACTTAANN…NN–5’

3’–TCACTTAANN…NN–5’

All

1/16

1/256

1/4096

5’–NN…NNAATTC–3’

5’–NN…NNAATTCA–3’

5’–NN…NNAATTCAC–3’

5’–NN…NNAATTCACT–3’

(A)

(B)

(C)

(D)

G–3’CTTAA–5’

GAATTNN…NN–3’CTTAANN…NN–5’

Figure 2.27 An amplified fragment length polymorphism (AFLP). (A and B) Genomic DNA isdigested with one or more restriction enzymes (in this case, EcoRI). (C) Oligonucleotide adaptors areligated onto the fragments; note that the single-stranded overhang of the adaptors matches those ofthe genomic DNA fragments. (D) The resulting fragments are subjected to PCR using primerscomplementary to the adaptors. The number of amplified fragments can be adjusted by manipulat-ing the number of nucleotides in the adaptors that are also present in the primers.

Page 73: Introduction to Molecular Genetics and Genomics

There are nevertheless a number ofchoices concerning the primer sequence. Aprimer that matches the adapters perfectlywill amplify all fragments, but this often re-sults in so many amplified fragments thatthey are not well separated in the gel. APCR primer must match perfectly at its 3'end to be elongated. Thus, additional nu-cleotides added to the 3' end reduce thenumber of amplified fragments, becausethese primers will amplify only those frag-ments that, by chance, have a complemen-tary nucleotide immediately adjacent to theEcoRI site. For example, the primer se-quence in the second row has a single-nucleotide 3' extension; because only1�4 � 1�4 of the fragments would be ex-pected to have a complementary T immedi-ately adjacent to the EcoRI site on bothsides, this primer is expected to amplify1�16 of all the restriction fragments. Simi-larly, the primer sequence in the third rowhas a two-nucleotide 3' extension, so thisprimer is expected to amplify 1�16 �1�16 � 1�256 of all the fragments. One ap-plication of AFLP analysis is to organismswith large genomes, such as grasshoppersand crickets, for which RAPD analysiswould yield an excessive number of ampli-fied bands. How large is a “large” genome?Compared to the human genome, that ofthe brown mountain grasshopper Podismapedestris is 7 times larger, and those of NorthAmerican salamanders in the genus Amphi-uma are 70 times larger! (Genome size isdiscussed in Chapter 8.)

Simple Tandem RepeatPolymorphisms (STRPs)One more type of DNA polymorphism war-rants consideration because it is useful inDNA typing for individual identificationand for assessing the degree of genetic relat-edness between individuals. This type ofpolymorphism is called a simple tandemrepeat polymorphism (STRP) becausethe genetic differences among DNA mole-cules consist of the number of copies of ashort DNA sequence that may be repeatedmany times in tandem at a particular locusin the genome. STRPs that are present atdifferent loci may differ in the sequenceand length of the repeating unit, as well asin the minimum and maximum number oftandem copies that occur in DNA moleculesin the population. A STRP with a repeatingunit of 2–9 bp is often called a microsatel-lite or a simple sequence length poly-morphism (SSLP), whereas a STRP with arepeating unit of 10–60 bp is often called aminisatellite or a variable number oftandem repeats (VNTR).

Figure 2.28 shows an example of a STRPwith a copy number ranging from 1through 10. Because the number of copiesdetermines the size of any restriction frag-ment that includes the STRP, each DNAmolecule yields a single-size restrictionfragment depending on the number ofcopies it contains. The STRP in Figure 2.28has 10 different “alleles” (again, we usequotation marks because the STRP may not

72 Chapter 2 DNA Structure and DNA Manipulation Au: Proofreader asks that youconfirm refer-ence to Chapter 8 in 1st para, 1st column.

Photocaptionphotocaptionphotocaptionphotocaptionphotocaptionphotocaptionphotocaptionpho-tocaptionphotocaptionphotocaptionphotocaption

09131-01-1747P

Page 74: Introduction to Molecular Genetics and Genomics

be in a gene), which could be distinguishedeither by Southern blotting using a probe toa unique (nonrepeating) sequence withinthe restriction fragment or by PCR amplifi-cation using primers to a unique sequenceon either side of the tandem repeats. In thissituation the locus is said to have multiplealleles in the population. Even with multi-ple alleles, however, any one chromosomecan carry only one of the alleles, and any

individual genotype can carry at most twodifferent alleles. Nevertheless, a largenumber of alleles means an even largernumber of genotypes, which is the featurethat gives STRPs their utility in individualidentification. For example, even with only10 alleles in a population of organisms,there could be 10 different homozygousgenotypes and 45 different heterozygousgenotypes.

2.6 Types of DNA Markers Present in Genomic DNA 73

Duplex DNAmolecules

Positionof band inDNA gel

Positions of cleavage sitesDirection of current

Larger DNAfragments

Smaller DNAfragments

Tandem repeats of a DNA sequence

1

1 2

1

1

1

1

1

1

1

1

2

2

2

2

2

2

2

2

3

3

3

3

3

3

3

3

4

4

4

4

4

4

4

5

5

5

5

5

5

6

6

6

6

6

7

7

7

7

8

8

8

9

9 10

5’3’

5’3’

5’3’

5’3’

5’3’

5’3’

5’3’

5’3’

5’3’

5’3’

5’3’

5’3’

5’3’

5’3’

5’3’

5’3’

5’3’

5’3’

5’3’

5’3’

Figure 2.28 In a simple tandem repeat polymorphism (STRP), the alleles in a population differ in thenumber of copies of a short sequence (typically 2–60 bp) that is repeated in tandem along the DNAmolecule. This example shows alleles in which the repeat number varies from 1 to 10. Cleavage atrestriction sites flanking the STRP yields a unique fragment length for each allele. The alleles can alsobe distinguished by the size of the fragment amplified by PCR using primers that flank the STRP.

Page 75: Introduction to Molecular Genetics and Genomics

More generally, with n alleles there aren homozygous genotypes and n(n � 1)�2heterozygous genotypes, or n(n � 1)�2 dif-ferent genotypes altogether. With STRPs,not only are there a relatively large numberof alleles, but no one allele is exceptionallycommon, so each of the many genotypes inthe population has a relatively low fre-quency. If the genotypes at 6–8 STRP lociare considered simultaneously, then eachpossible multiple-locus genotype is exceed-ingly rare. Because of their high degree ofvariation among people, STRPs are widelyused in DNA typing (sometimes calledDNA fingerprinting) to establish indi-vidual identity for use in criminal investi-gations, parentage determinations, and soforth (Chapter 17).

2.7 Applications of DNA Markers

Why are geneticists interested in DNAmarkers (DNA polymorphisms)? Their in-terest can be justified on any number ofgrounds. In this section we consider thereasons most often cited.

Genetic Markers, Genetic Mapping,and “Disease Genes”Perhaps the key goal in studying DNA poly-morphisms in human genetics is to identifythe chromosomal location of mutant genesassociated with hereditary diseases. In thecontext of disorders caused by the interac-tion of multiple genetic and environmentalfactors, such as heart disease, cancer, dia-betes, depression, and so forth, it is impor-tant to think of a harmful allele as a riskfactor for the disease, which increases theprobability of occurrence of the disease,rather than as a sole causative agent. Thisneeds to be emphasized, especially becausegenetic risk factors are often called diseasegenes. For example, the major “diseasegene” for breast cancer in women is thegene BRCA1. For women who carry a mu-tant allele of BRCA1, the lifetime risk ofbreast cancer is about 36 percent, andhence most women with this genetic riskfactor do not develop breast cancer. On the

other hand, among women who are notcarriers, the lifetime risk of breast cancer isabout 12 percent, and hence many womenwithout the genetic risk factor do developbreast cancer. Indeed, BRCA1 mutations arefound in only 16 percent of affected womenwho have a family history of breast cancer.The importance of a genetic risk factor canbe expressed quantitatively as the relativerisk, which equals the risk of the disease inpersons who carry the risk factor as com-pared to the risk in persons who do not.The relative risk for BRCA1 equals 3.0 (cal-culated as 36 percent�12 percent).

The utility of DNA polymorphisms in lo-cating and identifying disease genes resultsfrom genetic linkage, the tendency forgenes that are sufficiently close together ina chromosome to be inherited together. Ge-netic linkage will be discussed in detail inChapter 5, but the key concepts are sum-marized in Figure 2.29, which shows the lo-cation of many DNA polymorphisms alonga chromosome that also carries a geneticrisk factor denoted D (for disease gene).Each DNA polymorphism serves as a ge-netic marker for its own location in thechromosome. The importance of geneticlinkage is that DNA markers that are suffi-ciently close to the disease gene will tend tobe inherited together with the disease genein pedigrees—and the closer the markers,the stronger this association. Hence, the ini-tial approach to the identification of a dis-ease gene is to find DNA markers that aregenetically linked with the disease gene inorder to identify its chromosomal location,a procedure known as genetic mapping.Once the chromosomal position is known,other methods can be used to pinpoint thedisease gene itself and to study its functions.

If genetic linkage seems a roundaboutway to identify disease genes, consider thealternative. The human genome containsapproximately 80,000 genes. If geneticlinkage did not exist, then we would haveto examine 80,000 DNA polymorphisms,one in each gene, in order to identify a dis-ease gene. But the human genome has only23 pairs of chromosomes, and because ofgenetic linkage and the power of geneticmapping, it actually requires only a fewhundred DNA polymorphisms to identifythe chromosome and approximate locationof a genetic risk factor.

74 Chapter 2 DNA Structure and DNA Manipulation

Page 76: Introduction to Molecular Genetics and Genomics

Other Uses for DNA MarkersDNA polymorphisms are widely used in allaspects of modern genetics because theyprovide a large number of easily accessedgenetic markers for genetic mapping andother purposes. Among the other uses ofDNA polymorphisms are the following.

Individual identification. We have al-ready mentioned that DNA polymorphismshave application as a means of DNA typing(DNA fingerprinting) to identify differentindividuals in a population. DNA typing inother organisms is used to determine indi-vidual animals in endangered species andto identify the degree of genetic relatednessamong individual organisms that live inpacks or herds. For example, DNA typing inwild horses has shown that the wild stallionin charge of a harem of mares actually siresfewer than one-third of the foals.

Epidemiology and food safety science.DNA typing also has important applicationsin tracking the spread of viral and bacterialepidemic diseases, as well as in identifyingthe source of contamination in contami-nated foods.

Human population history. DNA poly-morphisms are widely used in anthropologyto reconstruct the evolutionary origin,global expansion, and diversification of thehuman population.

Improvement of domesticated plantsand animals. Plant and animal breedershave turned to DNA polymorphisms asgenetic markers in pedigree studies to iden-tify, by genetic mapping, genes that are as-sociated with favorable traits in order toincorporate these genes into currently usedvarieties of plants and breeds of animals.

History of domestication. Plant and ani-mal breeders also study genetic polymor-phisms to identify the wild ancestors ofcultivated plants and domesticated animals,as well as to infer the practices of artificialselection that led to genetic changes inthese species during domestication.

DNA polymorphisms as ecological in-dicators. DNA polymorphisms are beingevaluated as biological indicators of geneticdiversity in key indicator species present inbiological communities exposed to chemi-cal, biological, or physical stress. They arealso used to monitor genetic diversity inendangered species and species bred incaptivity.

Evolutionary genetics. DNA polymor-phisms are studied in an effort to describethe patterns in which different types ofgenetic variation occur throughout thegenome, to infer the evolutionary mecha-nisms by which genetic variation is main-tained, and to illuminate the processes bywhich genetic polymorphisms within

2.7 Applications of DNA Markers 75

D

DNA polymorphisms(genetic markers) along

the chromosomes

DNA markers that are close enough to a disease gene tend to be inherited together (genetically linked) with the disease gene.

DNA markers that are too far from the disease gene in the chromosome (or are in a different chromosome) are not linked to the disease gene. They do not tend to be inherited with the disease gene in pedigrees.

The closer a marker is to the disease gene,the closer the linkage and the morelikely it is that they will be inherited together.

Locus of a “disease gene”(a genetic risk factor), D

Figure 2.29 Concepts in genetic localization of genetic risk factors for disease. Polymorphic DNAmarkers (indicated by the vertical lines) that are close to a genetic risk factor (D) in the chromosometend to be inherited together with the disease itself. The genomic location of the risk factor is deter-mined by examining the known genomic locations of the DNA polymorphisms that are linked with it.

Page 77: Introduction to Molecular Genetics and Genomics

76 Chapter 2 DNA Structure and DNA Manipulation

population history, patterns of migration,and so forth.

Evolutionary relationships among spe-cies. Differences in homologous DNA se-quences between species is the basis ofmolecular systematics, in which the se-quences are analyzed to determine theancestral history (phylogeny) of the speciesand to trace the origin of morphological,behavioral, and other types of adaptationsthat have arisen in the course of evolution.

Chapter SummaryThe sequence of bases in the human genome is 99.9%identical from one person to the next. The remaining0.1%—comprising 3 million base pairs—differs among in-dividuals. Included in these differences are many muta-tions that cause or increase the risk of disease, but themajority of the differences are harmless in themselves.Any of these differences between genomes can be used asa genetic marker. Genetic markers are widely employed ingenetics to serve as positional landmarks along a chromo-some or to identify particular cloned DNA fragments. Themanipulation of DNA molecules to identify genetic mark-ers is the basic experimental operation in modern genetics.

A DNA strand is a polymer of deoxyribonucleotides,each composed of a nitrogenous base, a deoxyribosesugar, and a phosphate. Sugars and phosphates alternatein forming a polynucleotide chain with one terminal 3'-OH group and one terminal 5'-P group. In double-stranded (duplex) DNA, the two strands are paired andantiparallel. Each end of the double helix carries a termi-nal 3'-OH group in one strand and a terminal 5'-P groupin the other strand. The four bases found in DNA are thepurines, adenine (A) and guanine (G), and the pyrim-idines, cytosine (C) and thymine (T). Equal numbers ofpurines and pyrimidines are found in double-strandedDNA (Chargaff’s rules), because the bases are paired asA�T pairs and G�C pairs. The hydrogen-bonded basepairs, along with hydrophobic base stacking of the nu-cleotide pairs in the core of the double helix, hold the twopolynucleotide strands together in a double helix.

Duplex DNA can be cleaved into fragments of definedlength by restriction enzymes, each of which cleaves DNAat a specific recognition sequence (restriction site) usuallyfour or six nucleotide pairs in length. These fragmentscan be separated by electrophoresis. The positions of par-ticular restriction fragments in a gel can be visualized bymeans of nucleic acid hybridization, in which strands ofduplex DNA that have been separated (denatured) byheating are mixed and come together (renature) withstrands having complementary nucleotide sequences. Ina Southern blot, denatured and labeled probe DNA ismixed with denatured DNA made up of restriction frag-ments that have been transferred to a filter membrane af-ter electrophoresis. The probe DNA anneals and forms

stable duplexes with whatever fragments contain suffi-ciently complementary base sequences, and the positionsof these duplexes can be determined by exposing the fil-ter to x-ray film on which radioactive emission (or, insome procedures, light emission) produces an image ofthe band. Particular DNA sequences can also be amplifiedwithout cloning by means of the polymerase chain reac-tion (PCR), in which short, synthetic oligonucleotides areused as primers to replicate repeatedly and amplify thesequence between them. The primers must flank, andhave their 3' ends oriented toward, the region to be am-plified, because DNA polymerase can elongate theprimers only by the addition of successive nucleotides tothe 3' end of the growing chain. Each round of PCR am-plification results in a doubling of the number of ampli-fied fragments.

Most genes are present in pairs in the nonreproduc-tive cells of most animals and higher plants. One memberof each gene pair is in the chromosome inherited from thematernal parent, and the other member of the gene pair isat a corresponding location (locus) in the homologouschromosome inherited from the paternal parent. A genecan have different forms that correspond to differences inDNA sequence. The different forms of a gene are called al-leles. The particular combination of alleles present in anorganism constitutes its genotype. The observable charac-teristics of the organism constitute its phenotype. In anorganism, if the two alleles of a gene pair are the same(for example, AA or aa), then the genotype is homozy-gous for the A or a allele; if the alleles are different (Aa),then the genotype is heterozygous. Even though eachgenotype can include at most two alleles, multiple allelesare often encountered among the individuals in naturalpopulations.

DNA polymorphisms (DNA markers) are common innatural populations of most organisms. Among the mostwidely used DNA polymorphisms are single-nucleotidepolymorphisms (SNPs), restriction fragment length poly-morphisms (detected by Southern blots), and such PCR-based polymorphisms as random amplified polymorphicDNA (RAPD), amplified fragment length polymorphisms(AFLPs), and simple tandem repeat polymorphisms(STRPs). DNA polymorphisms are used in genetic map-

species become transformed into geneticdifferences between species.

Population studies. Population ecolo-gists employ DNA polymorphisms to assessthe level of genetic variation in diversepopulations of organisms that differ in ge-netic organization (prokaryotes, eukary-otes, organelles), population size, breedingstructure, or life-history characters, andthey use genetic polymorphisms withinsubpopulations of a species as indicators of

Page 78: Introduction to Molecular Genetics and Genomics

Review the Basics 77

ping studies to identify DNA markers that are geneticallylinked to disease genes (genetic risk factors) in the chro-mosome in order to pinpoint their location. They are alsoused in DNA typing for identifying individuals, trackingthe course of virus and bacterial epidemics, studying hu-

man population history, and improving cultivated plantsand domesticated animals, as well as for the geneticmonitoring of endangered species and for many otherpurposes.

Key Termsalleleamplificationamplified fragment length

polymorphism (AFLP)antiparallelbandbase compositionbase pairingbase stackingB form of DNAblunt endChargaff’s ruleschromosomecodominantdenaturationdenatured DNAdisease geneDNA cloningDNA fingerprintingDNA markerDNA polymeraseDNA polymorphismDNA typingdominant5'-P (phosphate) groupgel electrophoresisgenegenetic linkagegenetic mappinggenetic marker

genomic DNAgenotypeheterozygoushomologous chromosomeshomozygoushydrogen bondhydrophobic interactionkilobase (kb)locusmajor groovemicrosatelliteminisatelliteminor groovemonomorphicmultiple allelesnucleic acid hybridizationnucleosidenucleotideoligonucleotidepalindromepercent G � Cphenotypephosphodiester bondpolaritypolymerase chain reaction

(PCR)polynucleotide chainprimerprimer adapter

probepurinepyrimidinerandom amplified polymorphic

DNA (RAPD)relative riskrenaturationrestriction endonucleaserestriction enzymerestriction fragmentrestriction fragment length polymor-

phism (RFLP)restriction maprestriction siterisk factorsimple sequence length polymor-

phism (SSLP)simple tandem repeat polymorphism

(STRP)single-nucleotide polymorphism

(SNP)Southern blotsticky endthermophile3'-OH (hydroxyl) groupvariable number of tandem repeats

(VNTR).Z form of DNA

Review the Basics • What four bases are commonly found in the nu-

cleotides in DNA? Which form base pairs?

• Which chemical groups are present at the extreme 3’and 5’ ends of a single polynucleotide strand?

• What does it mean to say that a single strand of DNAstrand has a polarity? What does it mean to say thatthe DNA strands in a duplex molecule are anti-parallel?

• What are restriction enzymes and why are they im-portant in the study of particular DNA fragments?What does it mean to say that most restriction sitesare palindromes?

• Describe how a Southern blot is carried out. Explainwhat it used for. What is the role of the probe?

• How does the polymerase chain reaction work? Whatis it used for? What information about the target se-quence must be known in advance? What is the roleof the oligonucleotide primers?

• What is a DNA marker? Explain how harmless DNAmarkers can serve as aids in identifying disease genesthrough genetic mapping.

• Define and given an example of each of the followingkey genetic terms: locus, allele, genotype, hetero-zygous, homozygous, phenotype.

Page 79: Introduction to Molecular Genetics and Genomics

78 Chapter 2 DNA Structure and DNA Manipulation

Guide to Problem SolvingProblem 1 Distinguish between base pairing and basestacking in double-stranded DNA.

Answer Base pairing is the hydrogen bonding betweencorresponding bases in opposite strands of duplex DNA; A(adenine) is paired with T (thymine), and G (guanine) ispaired with C (cytosine). Base stacking refers to thehydrophobic (water-hating) interaction between consec-utive base pairs along a DNA duplex, which promotes theformation of a “stack” of base pairs with the sugar–phos-phate backbones of the strands running along outside.

Problem 2 The restriction enzyme EcoRI cleaves double-stranded DNA at the sequence 5'-GAATTC-3', and the re-striction enzyme HindIII cleaves at 5'-AAGCTT-3'. A20-kilobase (kb) circular plasmid is digested with eachenzyme individually and then in combination, and theresulting fragment sizes are determined by means of elec-trophoresis. The results are as follows:

EcoRI alone fragments of 6 kb and 14 kbHindIII alone fragments of 7 kb and 13 kbEcoRI and HindIII fragments of 2 kb, 4 kb, 5 kb and 9 kb

How many possible restriction maps are compatible withthese data? For each possible restriction map, make a dia-gram of the circular molecule and indicate the relative po-sitions of the EcoRI and HindIII restriction sites.

Answer Because the single-enzyme digests give twobands each, there must be two restriction sites for eachenzyme in the molecule. Furthermore, because digestionwith HindIII makes both the 6-kb and the 14-kb restric-tion fragments disappear, each of these fragments mustcontain one HindIII site. Considering the sizes of the frag-ments in the double digest, the 6-kb EcoRI fragment mustbe cleaved into 2-kb and 4-kb fragments, and the 14-kbEcoRI fragment must be cleaved into 5-kb and 9-kb frag-ments. Two restriction maps are compatible with thedata, depending on which end of the 6-kb EcoRI frag-ment the HindIII site is nearest. The position of the re-maining HindIII site is determined by the fact that the2-kb and 5-kb fragments in the double digest must be ad-jacent in the intact molecule in order for a 13-kb frag-ment to be produced by HindIII digestion alone. Theaccompanying figure shows the relative positions of the

14

6 2

49

5

(A) (B)EcoRI EcoRIHindIII

HindIII

EcoRI EcoRI

2

49

5

(C) EcoRI

HindIII

HindIIIEcoRI

contributing to the stability of double-stranded DNA is thestacking of the base pairs on top of one another as a result ofhydrophobic interactions. For further discussion of this fea-ture of DNA, and much else of interest regarding the discov-ery and analysis of this critical biological macromolecule,consult the keyword site.

• The concept of the polymerase chain reaction (PCR) oc-curred to Kary Mullis one night while cruising on Route 128from San Francisco to Mendocino. He immediately realizedthat this approach would be unique in its ability to amplify, atan exponential rate, a specific nucleotide sequence presentin a vanishingly small quantity amid a much larger back-ground of total nucleic acid. Once its feasibility was demon-strated, PCR was quickly recognized as a major technicaladvance in molecular biology. The new technique earnedMullis the 1993 Nobel Prize in chemistry, and today it is thebasis of a large number of experimental and diagnostic pro-cedures. At this keyword site you can learn more about the

GeNETics on the Web will introduce you to some of the mostimportant sites for finding genetic information on the Inter-net. To explore these sites, visit the Jones and Bartlett homepage at

http://www.jbpub.com/genetics

For the book Genetics: Analysis of Genes and Genomes, choosethe link that says Enter GeNETics on the Web. You will be pre-sented with a chapter-by -chapter list of highlighted keywords.Select any highlighted keyword and you will be linked to a Website containing genetic information related to the keyword.

• DNA is like Coca-Cola? According to this keyword site, itis. It contains sugar, which is highly soluble in water; phos-phate groups, which are of moderate solubility; and bases,which have extremely low solubility. (The base in the soft drinkis caffeine, which is chemically similar to adenine and cansometimes be incorporated into DNA, causing a mutation.)As this keyword site emphasizes, the most important property

Page 80: Introduction to Molecular Genetics and Genomics

Guide to Problem Solving 79

EcoRI sites (part A). Parts B and C are the two possible re-striction maps, which differ according to whether theEcoRI site at the top generates the 2-kb or the 4-kb frag-ment in the double digest.

Problem 3 The accompanying diagram shows the positionsof restriction sites (tick marks) for a particular restrictionenzyme that can be present in the DNA at a locus in a hu-man chromosome. The DNA present in any particularchromosome may be that shown at the top or that shownat the bottom. A probe DNA binds to the fragments at theposition shown by the rectangle. With respect to an RFLPbased on these fragments, three genotypes are possible.What are they? Use the symbol A1 to refer to the allelethat yields the upper DNA fragment, and use A2 to referto the allele that yields the lower DNA fragment. In the

12 kb9 kb

6 kb

3 kb

1 kb

A1

3 kb 9 kb

A2

12 kb

accompanying gel diagram, indicate the genotypes acrossthe top and the phenotype (band position or positions)expected for each genotype. The scale on the right showsthe expected positions of fragments ranging in size from 1to 12 kb.

Answer After cleavage with the restriction enzyme, theA1-type DNA yields a 3-kb fragment that binds with theprobe (yielding a 3-kb band) and a 9-kb fragment thatdoes not bind with the probe (yielding no visible band),whereas the A2-type DNA yields a 12-kb fragment thatbinds with the probe (yielding a 12-kb band). A particu-lar chromosome may carry allele A1 or allele A2. Becauseindividuals have two copies of each chromoosme (ex-cept for the sex chromosomes), any individual maycarry A1A1, A1A2, or A2A2. DNA from homozygous A1A1

genotypes yields a 3-kb band, that from heterozygousA1A2 genotypes yields both a 3-kb and a 12-kb band,and that from homozygous A2A2 genotypes yields a 12-kb band. The expected phenotypes are illustrated here.

12 kb9 kb

6 kb

3 kb

1 kb

A1A1 A1A2 A2A2

development of PCR from Mullis’s original conception, in-cluding two major innovations that were necessary to perfectthe process.

• Human beings rely on plants for food, shelter, and medi-cines. Although at least 5000 species are cultivated, modernagricultural research emphasizes a few widely cultivated cropswhile largely ignoring plants such as Bambara groundnut (Vi-gna subterranea), breadfruit (Artocarpus altilis), carob (Cerato-nia siliqua), coriander (Coriandrum sativum), emmer wheat(Triticum dicoccum), oca (Oxalis tuberosa), and ulluco (Ullucustuberosus). To learn more about these minor crops and the useof molecular markers for characterizing and preserving theirgenetic diversity, consult this keyword site.

• The Mutable Site changes frequently. Each new update in-cludes a different site that highlights genetics resources avail-able on the World Wide Web. Select the Mutable Site forChapter 2 and you will be linked automatically.

• The Pic Site showcases some of the most visually appeal-ing genetics sites on the World Wide Web. To visit the geneticsWeb site pictured below, select the PIC Site for Chapter 2.

Page 81: Introduction to Molecular Genetics and Genomics

80 Chapter 2 DNA Structure and DNA Manipulation

Analysis and Applications2.1 Many restriction enzymes produce restriction frag-ments that have “sticky ends.” What does this mean?

2.2 Which of the following sequences are palindromesand which are not? Explain your answer. Symbols suchas (A�T) mean that the site may be occupied by (in thiscase) either A or T, and N stands for any nucleotide.(a) 5'-AATT-3'(b) 5'-AAAA-3'(c) 5'-AANTT-3'(d) 5'-AA(A�T)AA-3'(e) 5'-AA(G�C)TT-3'

2.3 The following list gives half of each of a set of palin-dromic restriction sites. What is the complete sequence ofeach restriction site? (N stands for any nucleotide.)(a) 5'-AA??-3'(b) 5'-ATG???-3'(c) 5’'-GGN??-3'(d) 5'-ATNN??-3'

2.4 Apart from the base sequence, what is different aboutthe ends of restriction fragments produced by the follow-ing restriction enzymes? (The downward arrow repre-sents the site of cleavage in each strand.)(a) HaeIII (5'-GG j CC-3')(b) MaeI (5'-C j TAG-3')(c) CfoI (5'-GCG j C-3')

2.5 A solution contains double-stranded DNA fragments of size 3 kb,6 kb, 9 kb, and 12 kb. They are sepa-rated in an electrophoresis gel. Inthe diagram of the gel at the right,match the fragment sizes with thecorrect bands.

(a)

(b)

(c)

(d)

2.6 The linear DNA fragment shown here has cleavagesites for BamHI (B) and EcoRI (E). In the accompanyingdiagram of an electrophoresis gel, indicate the positions atwhich bands would be found after digestion with:(a) BamHI alone(b) EcoRI alone(c) BamHI and EcoRI togetherThe dashed lines on the right indicate the positions towhich bands of 1–12 kb would migrate.

2.7 The circular DNA molecule shown at the top of page81 has cleavage sites for BamHI and EcoRI. In the accom-panying diagram of an electrophoresis gel, indicate thepositions at which bands would be found after digestionwith:(a) BamHI alone(b) EcoRI alone(c) BamHI and EcoRI togetherThe dashed lines on the right indicate the positions towhich bands of 1–12 kb would migrate.

0 2 4 6

B E EB

8 10 12 kb

12 kb9 kb

6 kb

3 kb

BamHI

BamHI+

EcoRIEcoRI

Problem 4 A geneticist plans to use the polymerase chainreaction (PCR) to amplify part of the DNA sequenceshown below, using oligonucleotide primers that arehexamers matching the regions shown in red. (In prac-tice, hexamers are too short for most purposes.) State thesequence of the primer oligonucleotides that should beused, including the polarity, and give the sequence of theDNA molecule that results from amplification.

Answer The primers must be able to base-pair with thechosen primer sites and must be oriented with their 3'ends facing one another. Thus the “forward primer” (the

one that is elongated in a left-to-right direction) shouldhave the sequence 5'-GCAATG-3' and the “reverseprimer” (the one that is elongated in a right-to-left direc-tion) should have the sequence 3'-TTCGGC-5'. The re-sulting amplified sequence is shown below.

5'-GCAATGGTAATTTTTCAGGAACCAGGGCCCTTAAGCCG-3'3'-CGTTACCATTAAAAACTCCTTGGTCCCGGGAATTCGGC-5'

5'-GTACGGGCAATGGTAATTTTTCAGGAACCAGGGCCCTTAAGCCGTC-3'3'-CATGCCCGTTACCATTAAAAACTCCTTGGTCCCGGGAATTCGGCAG-5'

Problem 4

AU/ED: Space in art forprob 2.6 reduced slightlyto make page.OK?

Page 82: Introduction to Molecular Genetics and Genomics

Analysis and Applications 81

2.8 Consider the accompanying diagram of a region ofduplex DNA, in which the B’s represent bases in Watson–Crick pairs. Specify as precisely as possible the identity of:(a) B5, assuming that B1 � A(b) B6, assuming that B2 � C(c) B7, assuming that B3 � purine(d) B8, assuming that B4 � A or T

2.9 Refer to the DNA molecule diagrammed in Problem2.8. In the precursor nucleotides of this molecule, whichbase was each of the phosphate groups 1–4 associatedwith?

2.10 In a random sequence consisting of equal propor-tions of all four nucleotides, what is the probability that aparticular short sequence of nucleotides matches a re-striction site for:

B1P

B2P

B3P

B4P B8

B7

OH

B6

B5

P

P

P

P1

2

3

4

12 kb9

8 2

37

46

9 1

5 kb

0 kb

kb

6 kb

3 kb

1 kb

BamHI

BamHI

BamHI

BamHI+

EcoRIEcoRI

EcoRI

EcoRI

Problems 2.15, 2.16

5'-ATGCCCTGCAGTACCATGACGCGTTACGCAGCTGATCGAAACGCGTATATATGCC-3'3'-TACGGGACGTCATGGTACTGCGCAATGCGTCGACTAGCTTTGCGCATATATACGG-5'

(a) A restriction enzyme with a 4-base cleavage site?(b) A restriction enzyme with a 6-base cleavage site?(c) A restriction enzyme with an 8-base cleavage site?

2.11 In a random sequence consisting of equal proportionsof all four nucleotides, what is the average distance be-tween restriction sites for:(a) A restriction enzyme with a 4-base cleavage site?(b) A restriction enzyme with a 6-base cleavage site?(c) A restriction enzyme with an 8-base cleavage site?

2.12 If human DNA were essentially a random sequenceof 3 � 109 bp with equal proportions of all four nu-cleotides (this is an oversimplification), approximatelyhow many restriction fragments would be expected fromcleavage with(a) A “4-cutter” restriction enzyme?(b) A “6-cutter” restriction enzyme?(c) An “8-cutter” restriction enzyme?

2.13 Consider the restriction enzymes BamHI (cleavagesite 5'-G j GATCC-3') and Sau3A (cleavage site 5'-j GATC-3'), where the downward arrow denotes thesite of cleavage in each strand. Is every BamHI site aSau3A site? Is every Sau3A site a BamHI site? Explainyour answer.

2.14 A DNA duplex with the sequence shown here iscleaved with BamHI (cleavage site 5'-G j GATCC-3'),where the arrow denotes the site of cleavage in eachstrand. If the resulting fragments were brought togetherin the right order, and the breaks in the backbones re-paired, what possible DNA duplexes would be expected?

5'-ATTGGATCCAAACCCCAAAGGATCCTTA-3'3'-TAACCTAGGTTTGGGGTTTCCTAGGAAT-5'

2.15 The restriction enzymes PstI, PvuII, and MluI havethe following restriction sites, where the arrow indicatesthe site of cleavage in each strand.

PstI 5'-CTGCA j G-3'PvuII 5'-CAG j CTG-3'MluI 5'-A j CGCGT-3'

A DNA duplex with the sequence above is digested. Whatfragments would result from cleavage with:(a) PstI?(b) PvuII?(c) MluI?

2.16 With regard to the restriction enzymes and the DNAduplex in Problem 2.15, what fragments would resultfrom digestion with(a) PstI and MluI?(b) PvuII and MluI?

AU/ED: Art for prob 2.8 scaledto 85% to make page. OK?

AU/ED: Space in art forprob 2.7 reduced slightlyto make page.OK?

Page 83: Introduction to Molecular Genetics and Genomics

82 Chapter 2 DNA Structure and DNA Manipulation

2.17 Consider the sequence:

5'-CTGCAGGTG-3'3'-GACGTCCAC-5'

If this sequence were cleaved with PstI (5'-CTGCA j G-3'), couldit still be cleaved with PvuII (5'-CAG j CTG-3')? If it werecleaved with PvuII, could it still be cleaved with PstI? Ex-plain your answer.

2.18 A circular DNA molecule is cleaved with BamHI,EcoRI, or the two restriction enzymes together. The ac-companying diagram shows the resulting electrophoresisgel, with the band sizes indicated. Draw a diagram of thecircular DNA, showing the relative positions of the BamHIand EcoRI sites.

2.19 In the diagrams of DNA fragments shown here, thetick marks indicate the positions of restriction sites for aparticular restriction enzyme. A mixture of the two typesof molecules is digested and analyzed with a Southernblot using either probe A or probe B, which hybridizes tothe fragments where shown by the rectangles. In the ac-companying gel diagram, indicate the bands that wouldresult from the use of each of these probes. (The scale onthe right shows the expected positions of fragments from1 to 12 kb.)

2.20 In the accompanying diagram, the tick marks indi-cate the positions of restriction sites in two alternativeDNA fragments that can be present at the A locus in a hu-man chromosome. An RFLP analysis is carried out, usingprobe DNA that binds to the fragments at the positionshown by the rectangle. With respect to this RFLP, howmany genotypes are possible? (Use the symbol A1 to referto the allele that yields the upper DNA fragment, and useA2 to refer to the allele that yields the lower DNA frag-

12 kb9 kb

6 kb

3 kb

1 kb

5 kb 7 kb

12 kb

Probe A

Probe A Probe B

Probe B

10 kb7 kb

3 kb

BamHI

BamHI+

EcoRIEcoRI

ment.) In the accompanying gel diagram, indicate thegenotypes across the top and the phenotype (band posi-tion or positions) expected for each genotype. (The scaleon the right shows the expected positions of fragmentsfrom 1 to 12 kb.)

2.21 The accompanying diagram shows the DNA frag-ments associated with an RFLP revealed by a probe thathybridizes where shown by the rectangle. The tick marksare cleavage sites for the restriction enzyme used in theRFLP analysis. How many alleles does this RFLP have?How many genotypes are possible? In the accompanyinggel diagram, indicate the phenotype (pattern of bands)expected of each genotype. (The scale on the right showsthe expected positions of fragments from 1 to 12 kb.)

2.22 The thick horizontal lines shown below represent al-ternative DNA molecules at a particular locus in a humanchromosome. The tick marks indicate the positions of re-striction sites for a particular restriction enzyme. Ge-nomic DNA from a sample of people is digested andanalyzed by a Southern blot using a probe DNA that hy-bridizes at the position shown by the rectangle. Howmany possible RFLP alleles would be observed in the sam-ple? How many genotypes?

2 kb3 kb3 kb 4 kb

5 kb3 kb 4 kb

2 kb6 kb 4 kb

12 kb9 kb

6 kb

3 kb

1 kb

4 kb 8 kb

12 kb

A1

A2

4 kb 2 kb

6 kb

12 kb9 kb

6 kb

3 kb

1 kb

Page 84: Introduction to Molecular Genetics and Genomics

Analysis and Applications 83

2.23 The RFLPs described in Problem 2.22 are analyzedwith the same restriction enzyme but a different probe,which hybridizes at the site indicated here by the rectan-gle. How many RFLP alleles would be found? How manygenotypes? (Use the symbols A1, A2, . . . to indicate the al-leles.) In the accompanying gel diagram, indicate thegenotypes across the top and the phenotype (band posi-tion or positions) expected for each genotype. (The scaleon the right shows the expected positions of fragmentsfrom 1 to 12 kb.)

2.24 If hexamers were long enough oligonucleotides toserve as specific primers for PCR (for most purposes theyare too short), what DNA fragment would be amplifiedusing the “forward” primer pair 5'-AATGCC-3' and the“reverse” primer 3'-GCATGT-5' on the double-strandedDNA molecule shown below?

2.25 Would the primer pairs 3'-AATGCC-5' and 5'-GCATGT-3' amplify the same fragment described in theprevious problem? Explain your answer.

2.26 A human DNA fragment of 3 kb is to be amplified byPCR. The total genome size is 3 � 109 bp.

(a) Prior to amplification, what fraction of the total DNAdoes the target sequence constitute?

(b) What fraction does it constitute after 10 cycles ofPCR?

(c) After 20 cycles of PCR?

(d) After 30 cycles of PCR?

2.27 RAPD analysis is carried out using genomic DNAfrom four individuals (A–D) sampled from a natural pop-ulation of Hawaiian crickets. The gel shown at the top ofthe page resulted from PCR with one of the primer pairstested. Which bands are the RAPD polymorphisms?

2 kb3 kb3 kb 4 kb

5 kb3 kb 4 kb

2 kb6 kb 4 kb

12 kb9 kb

6 kb

3 kb

1 kb

2.28 A cigarette butt found at the scene of a robbery isfound to have a sufficient number of epithelial cells stuckto the paper for the DNA to be extracted and typed.Shown below are the results of typing for three probes(locus 1, locus 2, and locus 3) of the evidence (X) and 7suspects (A through G). Which of the suspects can be ex-cluded? Which cannot be excluded? Can you identify therobber? Explain your reasoning.

A B C D E F G X

A B C D E F G X

A B C D E F G X

Locus 1

Locus 2

Locus 3

12

10

A B C D

Bandnumber

8

6

4

2

5'-GATTACCGGTAAATGCCGGATTAACCCGGGTTATCAGGCCACGTACAACTGGAGTCC-3'3'-CTAATGGCCATTTACGGCCTAATTGGGCCCAATAGTCCGGTGCATGTTGACCTCAGG-5'

Problem 2.24

Page 85: Introduction to Molecular Genetics and Genomics

84 Chapter 2 DNA Structure and DNA Manipulation

2.29 A woman is uncertain which of two men is the fa-ther of her child. DNA typing is carried out on blood fromthe child (C), the mother (M), and each of the two males(A and B), using probes for a highly polymorphic DNAmarker on two different chromosomes (“locus 1” and“locus 2”). The result is shown in the accompanying dia-gram. Can either male be excluded as the possible father?Explain your reasoning.

A B M C A B M C

Locus 1 Locus 2

Challenge Problems2.31 The genome of Drosophila melanogaster is 180 �106 bp, and a fragment of size 1.8 kb is to be amplified byPCR. How many cycles of PCR are necessary for the am-plified target sequence to constitute at least 99 percent ofthe total DNA?

2.32 A murder victim is found in an advanced state ofdecomposition and cannot be identified. Police suspectthat the victim is one of five persons reported by theirparents as missing. DNA typing is carried out on tissuesfrom the victim (X) and on the five sets of parents (Athrough E), using probes for a highly polymorphic DNAmarker on two different chromosomes (“locus 1” and“locus 2”). The result is shown in the diagram at theright. How do you interpret the fact that genomic DNAfrom each individual yields two bands? Can you identifythe parents of the victim? Explain your reasoning.

2.33 The snake venom phosphodiesterase enzyme de-scribed in Problem 2.30 was originally used in a proce-dure called “nearest neighbor” analysis. In thisprocedure, a DNA strand is synthesized in the presence ofall four trinucleotides, one of which carries a radioactivephosphate in the α (innermost) position. Then the DNA isdigested to completion with snake venom phos-phodiesterase, and the resulting mononucleotides areseparated and assayed for radioactivity. Examine the dia-gram in Problem 2.30, and then answer the followingquestions.

A B C D EX X�� �� �� �� ��

A B C D E�� �� �� �� ��

Locus 1 Locus 2

(a) How does this procedure reveal the “nearest neigh-bors” of the radioactive nucleotide?

(b) Is the “nearest neighbor” on the 5' or the 3' side ofthe labeled nucleotide?

Problem 2.32

2.30 Snake venom phosphodiesterase cleaves the chemi-cal bonds shown in red in the accompanying diagram,leaving mononucleotides that are phosphorylated in the3' position. If the phosphates numbered 2 and 4 are ra-dioactive, which mononucleotides will be radioactive af-ter cleavage with snake venom phosphodiesterase?

B1P

B2

B3

B4 B8

B7

HO

B6

B51

2

3

4

P

P

P

P

P

P

P

AU/ED: Art for prob 2.8 scaledto 85% for consistency. OK?

Page 86: Introduction to Molecular Genetics and Genomics

Further Reading 85

Further ReadingBotstein, D., R. L. White, M. Skolnick, and R. W. Davis.

1980. Construction of a genetic linkage map in manusing restriction fragment length polymorphisms.American Journal of Human Genetics 32: 314.

Calladine, C. R., and H. Drew. 1997. Understanding DNA:The Molecule and How it Works. 2d ed. San Diego:Academic Press.

Cruzan, M. B. 1998. Genetic markers in plant evolution-ary ecology. Ecology 79: 400.

Danna, K., and D. Nathans. 1971. Specific cleavage ofSimian Virus 40 DNA by restriction endonuclease ofHemophilus influenzae. Proceedings of the National Academyof Sciences, USA 68: 2913.

DePamphilis, M. L., ed. 1996. DNA Replication inEukaryotic Cells. Cold Spring Harbor, NY: Cold SpringHarbor Press.

Eeles, R. A. and A. C. Stamps. 1993. Polymerase ChainReaction (PCR): The Technique and Its Application. AustinTX: R. G. Landes.

Frank-Kamenetskii, M. D. 1997. Unraveling DNA: TheMost Important Molecule of Life. Tr. by L. Liapin. ReadingMA: Addison-Wesley.

Hartl, D. L. 2000. A Primer of Populaton Genetics. 3d ed.Sunderland, MA: Sinauer.

Jorde, L. B., M. Bamshad, and A. R. Rogers. 1998. Usingmitochondrial and nuclear DNA markers toreconstruct human evolution. Bioessays 20: 126.

Kumar, L. S. 1999. DNA markers in plant improvement:An overview. Biotechnology Advances 17: 143.

Loxdale, H. D., and G. Lushai. 1998. Molecular markersin entomology. Bulletin of Entomological Research 88:577–600.

Mitton, J. B. 1994. Molecular approaches to populationbiology. Annual Review of Ecology & Systematics25: 45.

Mullis, K. B. 1990. The unusual origin of the polymerasechain reaction. Scientific American, April.

Olby, R. C. 1994. The Path to the Double Helix: The Discoveryof DNA. New York: Dover.

Pena, S. D. J., V. F. Prado, and J. T. Epplen. 1995. DNAdiagnosis of human genetic individuality. Journal ofMolecular Medicine 73: 555.

Sayre, A. 1975. Rosalind Franklin and DNA. New York:Norton.

Schafer, A. J., and J. R. Hawkins. 1998. DNA variationand the future of human genetics. Nature Biotechnology16: 33.

Smithies, O. 1995. Early days of electrophoresis. Genetics139: 1.

Thomson, G., and M. S. Esposito. 1999. The genetics ofcomplex diseases. Trends in Genetics 15: M17.

Wang, D. G., J.-B. Fan, C.-J. Siao, A. Berno, P. Young, R.Sapolsky et al. 1998. Large-scale identification,mapping, and genotyping of single-nucleotidepolymorphisms in the human genome. Science 280:1077.

Watson, J. D. 1968. The Double Helix. New York:Atheneum.

Page 87: Introduction to Molecular Genetics and Genomics

86

C H A P T E R

Transmission Genetics: The Principle of Segregation

3

Page 88: Introduction to Molecular Genetics and Genomics

87

• Inherited traits are determined by the genes present in thereproductive cells united in fertilization.

• Genes are usually inherited in pairs—one from the motherand one from the father.

• The genes in a pair may differ in DNA sequence and in theireffect on the expression of a particular inherited trait.

• The maternally and paternally inherited genes are not changedby being together in the same organism.

• In the formation of reproductive cells, the paired genesseparate again into different cells.

• Random combinations of reproductive cells containingdifferent genes result in Mendel’s ratios of traits appearingamong the progeny.

• The ratios actually observed for any trait are determined by thetypes of dominance and gene interaction.

• In genetic analysis, the complementation test is used todetermine whether two recessive mutations that cause asimilar phenotype are alleles of the same gene. The mutantparents are crossed, and the phenotype of the progeny isexamined. If the progeny phenotype is nonmutant(complementation occurs), the mutations are in differentgenes; if the progeny phenotype is mutant (complementationdoes not occur), the mutations are in the same gene.

What Did Gregor Mendel Think He Discovered?Gregor Mendel 1866Experiments on Plant Hybrids

TroubadourThe Huntington’s Disease Collaborative Research Group 1993A Novel Gene Containing a Trinucleotide Repeat That Is Expanded andUnstable on Huntington’s Disease Chromosomes

P R I N C I P L E S

C O N N E C T I O N S

C H A P T E R O U T L I N E3.1 Morphological and Molecular

Phenotypes

3.2 Segregation of a Single GenePhenotypic Ratios in the F2

GenerationThe Principle of SegregationVerification of SegregationThe Testcross and the Backcross

3.3 Segregation of Two or More GenesThe Principle of Independent

AssortmentThe Testcross with Unlinked

GenesThe Big Experiment

3.4 Human Pedigree AnalysisCharacteristics of Dominant and

Recessive InheritanceMolecular Markers in Human

Pedigrees

3.5 Pedigrees and ProbabilityMutually Exclusive PossibilitiesIndependent Possibilities

3.6 Incomplete Dominance and EpistasisMultiple AllelesHuman ABO Blood GroupsEpistasis

3.7 Genetic Analysis: Mutant Screens andthe Complementation TestThe Complementation Test in Gene

IdentificationWhy Does the Complementation

Test Work?

Page 89: Introduction to Molecular Genetics and Genomics

3.1 Morphological and Molecular Phenotypes

Geneticists study traits in which one or-ganism differs from another in order to dis-cover the genetic basis of the difference.Until the advent of molecular genetics,geneticists dealt mainly with morphologi-cal traits, in which the differences betweenorganisms can be expressed in terms ofcolor, shape, or size. Mendel studied sevenmorphological traits contrasting in seedshape, seed color, flower color, pod shape,and so forth (Figure 3.1). Perhaps the mostwidely known example of a contrastingMendelian trait is round versus wrinkledseeds. As pea seeds dry, they lose waterand shrink. Round seeds are round be-cause they shrink uniformly; wrinkledseeds are wrinkled because they shrink ir-regularly. The wrinkled phenotype is dueto the absence of a branched-chain form ofstarch known as amylopectin, which is notsynthesized in wrinkled seeds owing to adefect in the enzyme starch-branching en-zyme I (SBEI).

The nonmutant, or wildtype, allele ofthe gene for SBEI is designated W and themutant allele w. Seeds that are heterozy-gous Ww have only half as much SBEI en-zyme as wildtype homozygous WW seeds,but this half the normal amount of enzymeproduces enough amylopectin for the het-erozygous Ww seeds to shrink uniformlyand remain phenotypically round. Hence,with respect to seed shape, the wildtype Wallele is dominant over the mutant w allele.Because the w allele is recessive, only thehomozygous ww seeds become wrinkled.

The molecular basis of the wrinkled mu-tation is that the SBEI gene has become in-terrupted by the insertion, into the gene, ofa DNA sequence called a transposableelement. Such DNA sequences are capableof moving (transposition) from one locationto another within a chromosome or be-tween chromosomes. The molecular mech-anism of transposition is discussed inChapter 7, but for our present purposes it isnecessary to know only that transposableelements are present in most genomes, es-pecially the large genomes of eukaryotes,and that many spontaneous mutations re-sult from the insertion of transposable ele-ments into a gene.

88 Chapter 3 Transmission Genetics: The Principle of Segregation

In Chapter 2 we emphasized that in mod-ern genetics, a typical phenotype consistsof a band present at a particular position

in a DNA electrophoresis gel. In a method ofgenetic analysis such as RAPD (random am-plified polymorphic DNA), genomic DNAfrom a single individual can yield 30 ormore bands (see Figure 2.25). Each of thesebands represents a phenotype and resultsfrom the PCR (polymerase chain reaction)amplification of a single region of DNA inthe genome of the individual.

In this chapter we consider how genesare transmitted from parents to offspringand how this determines the distribution ofgenotypes and phenotypes among relatedindividuals. The study of the inheritance oftraits constitutes transmission genetics.This subject is also called Mendeliangenetics because the underlying principleswere first deduced from experiments ingarden peas (Pisum sativum) carried out inthe years 1856–1863 by Gregor Mendel, amonk at the monastery of St. Thomas in thetown of Brno (Brünn), in the Czech Repub-lic. He reported his experiments to a localnatural history society, published the re-sults and his interpretation in its scientificjournal in 1866, and began exchanging let-ters with one of the leading botanists of thetime. His experiments were careful and ex-ceptionally well documented, and his papercontains the first clear exposition of the sta-tistical rules governing the transmission ofgenes from generation to generation. Nev-ertheless, Mendel’s paper was ignored for34 years until its significance was finally ap-preciated.

AU: Reference infirst paragraphchanged to Fig.2.25. Correct? orrestore to Fig.2.24? AU: In first para-graph, phrase“phenotype ischaracterized by”changed to “phe-notype consists.”OK?

In this small gardenplot adjacent to the monastery of St. Thomas, GregorMendel grew morethan 33,500 peaplants in the years1856–1863, includingmore than 6400 plantsin one year alone. Hereceived some helpfrom two fellowmonks who assisted inthe experiments.

Page 90: Introduction to Molecular Genetics and Genomics

Standard

Flower and pod position

Stem length

Pod color

Pod shape

Flower color

Seed color

Seed shape

Parental strain 1: Dominant Parental strain 2: RecessivePhenotype of progeny of

monohybrid cross

Inflated

Dwarf

Axial (along stem) Terminal (at top of stem)

Constricted

Purple White

Yellow Green

Green Yellow

Round Wrinkled

Standard

Inflated

Axial

Purple

Yellow

Green

Round

Figure 3.1 The seven different traits studied in peas by Mendel. The phenotype shown at the farright is the dominant trait, which appears in the hybrid produced by crossing.

3.1 Morphological and Molecular Phenotypes 89

Page 91: Introduction to Molecular Genetics and Genomics

Figure 3.2 includes a diagram of the DNAstructure of the wildtype W and mutant walleles and shows the DNA insertion that in-terrupts the w allele. Highlighted are twoEcoRI restriction sites, present in both al-leles, that flank the site of the insertion. Thediagram in part C indicates what pattern ofbands would be expected if one were tocarry out a Southern blot (Section 2.3) inwhich genomic DNA was digested to com-pletion with EcoRI and then the resultingfragments were separated by electrophore-sis and hybridized with a labeled probe com-plementary to a region shared between theW and w alleles. The EcoRI fragment fromthe W allele would be smaller than that ofthe w allele, because of the inserted DNA inthe w allele, and thus it would migrate fasterthan the corresponding fragment from thew allele and move to a position closer to thebottom of the gel. Genomic DNA from ho-mozygous WW would yield a single, faster-migrating band; that from homozygous wwa single, slower-migrating band; and thatfrom heterozygous Ww two bands with thesame electrophoretic mobility as those ob-served in the homozygous genotypes. InFigure 3.2C, the band in the homozygousgenotypes is shown as somewhat thickerthan those from the heterozygous geno-type, because the single band in each ho-

mozygous genotype comes from the twocopies of whichever allele is homozygous,and thus it contains more DNA than the cor-responding DNA in the heterozygous geno-type, in which only one copy of each allele ispresent.

Hence, as illustrated in Figure 3.2C, theRFLP analysis clearly distinguishes betweenthe genotypes WW, Ww, and ww, becausethe heterozygous Ww genotype exhibitsboth of the bands observed in the homo-zygous genotypes. This situation is de-scribed by saying that the W and w allelesare codominant with respect to the mole-cular phenotype. However, as indicated bythe seed shapes in Figure 3.2C, W is domi-nant over w with respect to morphologicalphenotype. In the discussion that follows,we use the RFLP analysis of the W and w al-leles to emphasize the importance of mole-cular phenotypes in modern genetics and todemonstrate experimentally the principlesof genetic transmission.

3.2 Segregation of a Single GeneMendel selected peas for his experimentsfor two primary reasons. First, he had accessto varieties that differed in contrastingtraits, such as round versus wrinkled seeds

90 Chapter 3 Transmission Genetics: The Principle of Segregation

Figure 3.2 (A) W (round) is an allele of a gene that specifies the amino acid sequence of starchbranching enzyme I (SBEI). (B) w (wrinkled) is an allele that encodes an inactive form of theenzyme because its DNA sequence is interrupted by the insertion of a transposable element. (C) Atthe level of the morphological phenotype, W is dominant to w: Genotype WW and Ww have roundseeds, whereas genotype ww has wrinkled seeds. The molecular difference between the alleles canbe detected as a restriction fragment length polymorphism (RFLP) using the enzyme EcoRI and aprobe that hybridizes at the site shown. At the molecular level, the alleles are codominant: DNAfrom each genotype yields a different molecular phenotype—a single band differing in size forhomozygous WW and ww, and both bands for heterozygous Ww.

(A)–SBEI gene (wildtype round, W ) Genotype

(C)

(B)–SBEI gene with insertion (mutant wrinkled, w )

Site of hybridizationwith probe DNA

Seedphenotype

Molecularphenotype

Transposable elementinsertion into SBEI gene

WW Ww ww

EcoRI EcoRIDNADuplex

EcoRI EcoRI

Page 92: Introduction to Molecular Genetics and Genomics

and yellow versus green seeds. Second, hisearlier studies had indicated that peas usu-ally reproduce by self-pollination, in whichpollen produced in a flower is used to fertil-ize the eggs in the same flower. To producehybrids by cross-pollination (outcrossing),all he needed to do was open the keel petal(enclosing the reproductive structures), re-move the immature anthers (the pollen-producing structures) before they shedpollen, and dust the stigma (the femalestructure) with pollen taken from a floweron another plant (Figure 3.3). The relativelysmall space needed to grow each plant, andthe relatively large number of progeny that

could be obtained, gave him the opportu-nity, as he says in his paper, to “determinethe number of different forms in whichhybrid progeny appear” and to “ascertaintheir numerical interrelationships.”

The fact that garden peas are normallyself-fertilizing means that in the absence ofdeliberate outcrossing, plants with contrast-ing traits are usually homozygous for alter-native alleles of a gene affecting the trait;for example, plants with round seeds havegenotype WW and those with wrinkledseeds genotype ww. The homozygous geno-types are indicated experimentally by theobservation that hereditary traits in each

3.2 Segregation of a Single Gene 91

Flower of plant grown from a round seed

Open flower and discard anthers

Open flower and collect pollen

All seeds formed from flower are round

Flower of plant grown from a wrinkled seed

Keel petal enclosing reproductive structures

Anther(male part)

Stigma(female part)

Ovule (forms pea pod)

Brush pollen onto stigma

Figure 3.3 Crossing peaplants requires someminor surgery inwhich the anthers of aflower are removedbefore they producepollen. The stigma, orfemale part of theflower, is notremoved. It is fertil-ized by brushing withmature pollen grainstaken from anotherplant.

Page 93: Introduction to Molecular Genetics and Genomics

variety are true-breeding, which meansthat plants produce only progeny likethemselves when allowed to self-pollinatenormally. Outcrossing between plants thatdiffer in one or more traits creates ahybrid. If the parents differ in one, two, orthree traits, the hybrid is a monohybrid, dihy-brid, or trihybrid. In keeping track of parentsand their hybrid progeny, we say that theparents constitute the P1 generation andtheir hybrid progeny the F1 generation.

Because garden peas are sexual organ-isms, each cross can be performed in twoways, depending on which phenotype ispresent in the female parent and which inthe male parent. With round versus wrin-kled, for example, the female parent can beround and the male wrinkled, or the re-verse. These are called reciprocal crosses.Mendel was the first to demonstrate the fol-lowing important principle:

The outcome of a genetic cross does notdepend on which trait is present in the maleand which is present in the female; reciprocalcrosses yield the same result.

This principle is illustrated for round andwrinkled seeds in Figure 3.4. The gel iconsshow the RFLP bands that genomic DNAfrom each type of seed in these crosseswould yield. The genotypes of the crossesand their progeny are as follows:

Cross A: WW / � ww ? � WwCross B: ww / � WW ? � Ww

In both reciprocal crosses, the progeny havethe morphological phenotype of roundseeds, but as shown by the RFLP diagramson the right, all progeny genotypes are ac-tually heterozygous Ww and therefore dif-ferent from either parent. The geneticequivalence of reciprocal crosses illustratedin Figure 3.4 is a principle that is quite gen-eral in its applicability, but there is an im-portant exception, having to do with sexchromosomes, that will be discussed inChapter 4.

In the following section we examine afew of Mendel’s original experiments in thecontext of RFLP analysis in order to relatethe morphological phenotypes and their

92 Chapter 3 Transmission Genetics: The Principle of Segregation

Flower on plant grown from wrinkled seed

Flower on plant grown from round seed

Cross A

Flower on plant grown from wrinkled seed

Flower on plant grown from round seed

Ovule from round variety. Pollen from wrinkled variety.Result: All seeds round.

Ovule from wrinkled variety. Pollen from round variety.Result: All seeds round.

Cross B

Figure 3.4 Morpho-logical and molecularphenotypes showingthe equivalence ofreciprocal crosses. In this example, thehybrid seeds are roundand yield an RFLPpattern with twobands, irrespective ofthe direction of thecross.

Page 94: Introduction to Molecular Genetics and Genomics

ratios to the molecular phenotypes thatwould be expected.

Phenotypic Ratios in the F2 GenerationAlthough the progeny of the crosses in Fig-ure 3.4 have the dominant phenotype ofround seeds, the RFLP analysis shows thatthey are actually heterozygous. The w alleleis hidden with respect to the morphologicalphenotype because w is recessive to W.Nevertheless, the wrinkled phenotypereappears in the next generation when thehybrid progeny are allowed to undergo self-fertilization. For example, if the round F1seeds from Cross A in Figure 3.4 weregrown into sexually mature plants and al-lowed to undergo self-fertilization, some ofthe resulting seeds would be round andothers wrinkled. The progeny seeds pro-duced by self-fertilization of the F1 genera-tion constitute the F2 generation. WhenMendel carried out this experiment, in theF2 generation he observed the resultsshown in the following diagram:

Note that the ratio 5474 : 1850 is approxi-mately 3 : 1.

A 3 : 1 ratio of dominant : recessiveforms in the F2 progeny is characteristic ofsimple Mendelian inheritance. Mendel’sdata demonstrating this point are shown inTable 3.1. Note that the first two traits in thetable (round versus wrinkled seeds and yel-low versus green seeds) have many moreobservations than any of the others. Thereason is that these traits can be classifieddirectly in the seeds, whereas the otherscan be classified only in the mature plants,and Mendel could analyze many moreseeds than he could mature plants. The

Parental lines:

F1 generation:

F2 generation:

All round seeds

: 1 wrinkled(1850)

3 round(5474)

Round Wrinkled

principal observations from the data inTable 3.1 are

• The F1 hybrids express only the dominanttrait (because the F1 progeny are heterozy-gous—for example, Ww)

• In the F2 generation, plants with either thedominant or the recessive trait are present(which means that some F2 progeny are ho-mozygous—for example, ww).

• In the F2 generation, there are approximatelythree times as many plants with the domi-nant phenotype as plants with the recessivephenotype.

The 3 : 1 ratio observed in the F2 generationis the key to understanding the mechanismof genetic transmission. In the next sectionwe use RFLP analysis of the W and w allelesto explain why this ratio is produced.

3.2 Segregation of a Single Gene 93

Round � wrinkled (seeds) Round 5474 round, 1850 wrinkled 2.96 : 1

Yellow � green (seeds) Yellow 6022 yellow, 2001 green 3.01 : 1

Purple � white (flowers) Purple 705 purple, 224 white 3.15 : 1

Inflated � constricted (pods) Inflated 882 inflated, 299 constricted 2.95 : 1

Green � yellow (unripe pods) Green 428 green, 152 yellow 2.82 : 1

Axial � terminal (flower position) Axial 651 axial, 207 terminal 3.14 : 1

Long � short (stems) Long 787 long, 277 short 2.84 : 1

Table 3.1 Results of Mendel’s monohybrid experiments

Parental traits F1 trait Number of F2 progeny F2 ratio

captioncaptioncap-tioncaptioncaption-captioncaptioncaptioncaptioncaptioncap-tioncaptioncaption-captioncaption

09131_01_1715P

Page 95: Introduction to Molecular Genetics and Genomics

The Principle of SegregationThe 3 : 1 ratio can be explained with refer-ence to Figure 3.5. This is the heart ofMendelian genetics. You should master itand be able to use it to deduce the progenytypes produced in crosses. The diagram il-lustrates these key features of single-geneinheritance:

1. Genes come in pairs, which means that acell or individual has two copies (alleles) ofeach gene.

2. For each pair of genes, the alleles may beidentical (homozygous WW or homozygousww), or they may be different (heterozy-gous Ww).

3. Each reproductive cell (gamete) producedby an individual contains only one allele ofeach gene (that is, either W or w).

4. In the formation of gametes, any particulargamete is equally likely to include either al-lele (hence, from a heterozygous Ww geno-type, half the gametes contain W and theother half contain w).

5. The union of male and female reproductivecells is a random process that reunites thealleles in pairs.

The essential feature of transmission ge-netics is the separation, technically calledsegregation, in unaltered form, of the twoalleles in an individual during the forma-tion of its reproductive cells. Segregation

94 Chapter 3 Transmission Genetics: The Principle of Segregation

: 1/2Ww : 1/4 ww1/4WW

WW ww

1/4 Ww

1/4 Ww 1/4 ww

WW parent cancontribute onlyW gametes

Both W and wpresent, but seedsare round

Ww parentproduces equalnumbers ofW and w gametes

1/3 of roundseeds are WW;2/3 of roundseeds are Ww

ww parent cancontribute onlyw gametes

W

W

W

w

w

w

Parents:

Reproductive

cells (gametes):

F1 progeny:

Fertilization

Ww

F2 progeny:

Female gametes

Malegametes

1/21/2

1/2

1/2

1/4 WW

Meiosis

Figure 3.5 A diagrammaticexplanation of the 3 : 1 ratioof dominant : recessivemorphological phenotypesobserved in the F2 genera-tion of a monohybrid cross.The 3 : 1 ratio is observedbecause of dominance. Note that the ratio ofWW : Ww : ww genotypes inthe F2 generation is 1 : 2 : 1,as can be seen from therestriction fragmentphenotypes.

Page 96: Introduction to Molecular Genetics and Genomics

3.2 Segregation of a Single Gene 95

corresponds to points 3 and 4 in the fore-going list. The principle of segregation issometimes called Mendel’s first law.

The Principle of Segregation: In the forma-tion of gametes, the paired hereditary determi-nants separate (segregate) in such a way thateach gamete is equally likely to contain eithermember of the pair.

Another key feature of transmission genet-ics is that the hereditary determinants arepresent as pairs in both the parental organ-isms and the progeny organisms but as sin-gle copies in the reproductive cells. Thisfeature corresponds to points 1 and 5 in theforegoing list.

Figure 3.5 illustrates the biologicalmechanism underlying the importantMendelian ratios in the F2 generation of 3 : 1for phenotypes and 1 : 2 : 1 for genotypes.To understand these ratios, consider first theparental generation in which the originalcross is WW � ww. The sex of the parents isnot stated because reciprocal crosses yieldidentical results. (There is, however, a con-vention in genetics that unless otherwisespecified, crosses are given with the femaleparent listed first.) In the original cross, theWW parent produces only W-containinggametes, whereas the ww parent producesonly w-containing gametes. Segregation stilltakes place in the homozygous genotypes aswell as in the heterozygous genotype, eventhough all of the gametes carry the sametype of allele (W from homozygous WW andw from homozygous ww). When the W-bearing and w-bearing gametes cometogether in fertilization, the hybrid geno-type is heterozygous Ww, which is shown bythe bands in the gel icon next to the F1 prog-eny. With regard to seed shape, the hybridWw seeds are round because W is dominantover w.

When the heterozygous F1 progenyform gametes, segregation implies that halfthe gametes will contain the W allele andthe other half will contain the w allele.These gametes come together at randomwhen an F1 individual is self-fertilized orwhen two F1 individuals are crossed. Theresult of random fertilization can be de-duced from the sort of cross-multiplicationsquare shown at the bottom of the figure,in which the female gametes and their fre-

quencies are arrayed across the top marginand the male gametes and their frequenciesalong the left-hand margin. This calculatingdevice is widely used in genetics and iscalled a Punnett square after its inventorReginald C. Punnett (1875–1967). ThePunnett square in Figure 3.5 shows thatrandom combinations of the F1 gametes re-sult in an F2 generation with the genotypiccomposition 1�4 WW, 1�2 Ww, and 1�4 ww.This can be confirmed by the RFLP bandingpatterns in the gel icons because of thecodominance of W and w with respect tothe molecular phenotype. But because W isdominant over w with respect to the mor-phological phenotoype, the WW and Wwgenotypes have round seeds and the wwgenotypes have wrinkled seeds, yieldingthe phenotypic ratio of round : wrinkledseeds of 3 : 1. Hence it is a combination ofsegregation, random union of gametes, anddominance that results in the 3 : 1 ratio.

The ratio of F2 genotypes is as importantas the ratio of F2 phenotypes. The Punnettsquare in Figure 3.5 also shows that the ra-tio of WW : Ww : ww genotypes is 1 : 2 : 1,which can be confirmed directly by theRFLP analysis.

Verification of SegregationThe round seeds in Figure 3.5 conceal agenotypic ratio of 1 WW : 2 Ww. To say thesame thing in another way, among the F2seeds that are round (or, more generally,among organisms that show the dominantmorphological phenotype), 1�3 are homo-zygous (in this example, WW) and 2�3 areheterozygous (in this example, Ww). Thisconclusion is obvious from the RFLP pat-terns in Figure 3.5, but it is not at all obvi-ous from the morphological phenotypes.Unless you knew something about geneticsalready, it would be a very bold hypothe-sis, because it implies that two organismswith the same morphological phenotype(in this case round seeds) might neverthe-less differ in molecular phenotype and ingenotype.

Yet this is exactly what Mendel pro-posed. But how could this hypothesis betested experimentally? He realized that itcould be tested via self-fertilization of the F2plants. With self-fertilization, plants grown

Page 97: Introduction to Molecular Genetics and Genomics

96 Chapter 3 Transmission Genetics: The Principle of Segregation

from the homozygous WW genotypesshould be true-breeding for round seeds,whereas those from the heterozygous Wwgenotypes should yield round and wrinkledseeds in the ratio of 3 : 1. On the otherhand, the plants grown from wrinkled seedsshould be true-breeding for wrinkled be-cause these plants are homozygous ww. Theresults Mendel obtained are summarized inFigure 3.6. As predicted from the genetic hy-pothesis, plants grown from F2 wrinkledseeds were true-breeding for wrinkledseeds, yielding only wrinkled seeds in the F3generation. But some of the plants grownfrom round seeds showed evidence of seg-regation. Among 565 plants grown fromround F2 seeds, 372 plants produced bothround and wrinkled seeds in a proportionvery close to 3 : 1, whereas the remaining193 plants produced only round seeds in

the F3 generation. The ratio 193 : 372 equals1 : 1.93, which is very close to the ratio 1 : 2of WW : Ww genotypes predicted from thegenetic hypothesis.

An important feature of the homo-zygous round and homozygous wrinkledseeds produced in the F2 and F3 generationsis that the phenotypes are exactly the sameas those observed in the original parents inthe P1 generation. This makes sense interms of DNA, because the DNA of each al-lele remains unaltered unless a new muta-tion happens to occur. Mendel describedthis result in a letter by saying that in theprogeny of crosses, “the two parental traitsappear, separated and unchanged, andthere is nothing to indicate that one of themhas either inherited or taken over anythingfrom the other.” From this finding, he con-cluded that the hereditary determinants for

What Did Gregor Mendel Think He Discovered?

Gregor Mendel 1866Monastery of St. Thomas, Brno [then Brünn], Czech RepublicExperiments on Plant Hybrids (original in German)

Mendel’s paper is remarkable for its preci-sion and clarity. It is worth reading in itsentirety for this reason alone. Although themost important discovery attributed toMendel is segregation, he never uses thisterm. His description of segregation isfound in the first passage in italics in theexcerpt. (All of the italics are reproducedfrom the original.) In his description of theprocess, he takes us carefully through theseparation of A and a in gametes and theircoming together again at random in fertil-ization. One flaw in the description isMendel’s occasional confusion betweengenotype and phenotype, which is illus-trated by his writing A instead of AA and ainstead of aa in the display toward the endof the passage. Most early geneticists

made no consistent distinction betweengenotype and phenotype until 1909, whenthe terms themselves were coined.

Artificial fertilization undertaken on or-namental plants to obtainnew color variants initiatedthe experiments reportedhere. The striking regularitywith which the same hybridforms always reappearedwhenever fertilization be-tween like species took placesuggested further experi-ments whose task it was tofollow that development ofhybrids in their progeny. . . .This paper discusses the at-tempt at such a detailed experiment. . . .Whether the plan by which the individualexperiments were set up and carried outwas adequate to the assigned taskshould be decided by a benevolent judg-ment. . . . [Here the experimental re-sults are described in detail.] Thus

experimentation also justifies the as-sumption that pea hybrids form germinaland pollen cells that in their compositioncorrespond in equal numbers to all theconstant forms resulting from the com-

bination of traits unitedthrough fertilization.The difference of formsamong the progeny ofhybrids, as well as theratios in which they areobserved, find an ade-quate explanation inthe principle [of segre-gation] just deduced.The simplest case isgiven by the series forone pair of differing

traits. It is shown that this series is de-scribed by the expression: A � 2Aa � a,in which A and a signify the forms withconstant differing traits, and Aa the formhybrid for both. The series contains fourindividuals in three different terms. Intheir production, pollen and germinal

Whether the plan bywhich the individual

experiments wereset up and carried

out was adequate tothe assigned taskshould be decided

by a benevolentjudgment.

Page 98: Introduction to Molecular Genetics and Genomics

3.2 Segregation of a Single Gene 97

Of 7324 F2 seeds

2/3 (372) gave plants with pods containingboth round and wrinkledF3 seeds in a 3 : 1 ratioof round : wrinkled

= 1/2 of all F2 seeds

= 1/4 of all F2 seeds

= 1/4 of all F2 seeds

1/4 (1850) were wrinkledall gave plants producingonly wrinkled F3 seeds

3/4 (5474) were round;565 were planted, and...

1/3 (193) gave plantswith pods containing onlyround F3 seeds

The result of fertilization can be vi-sualized by writing the designations forassociated germinal and pollen cells inthe form of fractions, pollen cells abovethe line, germinal cells below. In thecase under discussion one obtains

� � �

In the first and fourth terms germi-nal and pollen cells are alike; thereforethe products of their association mustbe constant, namely A and a; in the sec-ond and third, however, a union of thetwo differing parental traits takes placeagain, therefore the forms arising fromsuch fertilizations are absolutely identi-cal with the hybrid from which they de-rive. Thus, repeated hybridization takesplace. The striking phenomenon, thathybrids are able to produce, in additionto the two parental types, progeny thatresemble themselves is thus explained:Aa and aA both give the same associa-tion, Aa, since, as mentioned earlier, it

a�a

a�A

A�a

A�A

makes no difference to the consequenceof fertilization which of the two traits be-longs to the pollen and which to the ger-minal cell. Therefore

� � � � A � 2Aa � a

This represents the average courseof self-fertilization of hybrids when twodiffering traits are associated in them.In individual flowers and individualplants, however, the ratio in which themembers of the series are formed maybe subject to not insignificant devia-tions. . . . Thus it was proven experi-mentally that, in Pisum, hybrids formdifferent kinds of germinal and pollencells and that this is the reason for thevariability of their offspring.

Source: Verhandlungen des naturforschendenden Vereines in Brünn 4: 3–47.

a�a

a�A

A�a

A�A

Pollen cells A A a a

j jGerminal cells A A a a

jj

cells of form A and a participate, on theaverage, equally in fertilization; there-fore each form manifests itself twice,since four individuals are produced. Par-ticipating in fertilization are thus:

Pollen cells A � A � a � aGerminal cells A � A � a � a

It is entirely a matter of chancewhich of the two kinds of pollen com-bines with each single germinal cell.However, according to the laws of prob-ability, in an average of many cases itwill always happen that every pollenform A and a will unite equally oftenwith every germinal-cell form A and a;therefore, in fertilization, one of the twopollen cells A will meet a germinal cellA, the other a germinal cell a, andequally, one pollen cell a will becomeassociated with a germinal cell A, andthe other a.

the traits in the parental lines were trans-mitted as two different elements that retaintheir purity in the hybrids. In other words,the hereditary determinants do not “mix”

or “contaminate” each other. In modernterminology, this means that, with rare butimportant exceptions, genes are transmittedunchanged from generation to generation.

Figure 3.6 Summary ofF2 phenotypes and theprogeny produced byself-fertilization.

Page 99: Introduction to Molecular Genetics and Genomics

98 Chapter 3 Transmission Genetics: The Principle of Segregation

gametes produced by the heterozygous parent,because the recessive parent contributes onlyrecessive alleles.

Mendel carried out a series of testcrosseswith various traits. The results are shown inTable 3.2. In all cases, the ratio of phenotypesamong the testcross progeny is very close tothe 1 : 1 ratio expected from segregation ofthe alleles in the heterozygous parent.

Another valuable type of cross is abackcross, in which hybrid organisms arecrossed with one of the parental genotypes.Backcrosses are commonly used by geneti-cists and by plant and animal breeders, aswe will see in later chapters. Note that thetestcrosses in Table 3.2 are also backcrosses,because in each case, the F1 heterozygousparent came from a cross between thehomozygous dominant and the homozy-gous recessive.

3.3 Segregation of Two or More Genes

The results of many genetic crosses dependon the segregation of the alleles of two ofmore genes. The genes may be in differentchromosomes or in the same chromosome.Although in this section we consider thecase of genes that are in two different chro-mosomes, the same principles apply togenes that are in the same chromosome butare so far distant from each other that theysegregate independently. The case of linkageof genes in the same chromosome is exam-ined in Chapter 5.

To illustrate the principles, we consideragain a cross between homozygous geno-types, but in this case homozygous for thealleles of two genes. A specific example is atrue-breeding variety of garden peas withseeds that are wrinkled and green (geno-type ww gg) versus a variety with seeds thatare round and yellow (genotype WW GG).As suggested by the use of uppercase andlowercase symbols for the alleles, the domi-nant alleles are W and G, the recessive al-leles w and g. Crossing these strains yieldsF1 seeds with the genotype Ww Gg, whichare phenotypically round and yellow be-cause of the dominance relations. Whenthe F1 seeds are grown into mature plantsand self-fertilized, the F2 progeny show theresult of simultaneous segregation of the W,w allele pair and the G, g allele pair. When

Homozygousrecessiveparent

Heterozygous Ww parent

1/21/2

all

1/2 Ww 1/2 ww

Segregation yields W and w gametesin a ratio of 1 : 1.

The progeny of a testcross includes dominant and recessive phenotypes in a ratio of 1 : 1.

W w

w

Figure 3.7 In a testcross of an Ww heterozygous parent with a wwhomozygous recessive, the progeny are Ww and ww in the ratio of1 : 1. A testcross shows the result of segregation.

The Testcross and the BackcrossAnother straightforward way of testing thegenetic hypothesis in Figure 3.5 is by meansof a testcross, a cross between an organismthat is heterozygous for one or more genes(for example, Ww) and an organism that ishomozygous for the recessive alleles (forexample, ww). The result of such a testcrossis shown in Figure 3.7. Because the het-erozygous parent is expected to produce Wand w gametes in equal numbers, whereasthe homozygous recessive produces only wgametes, the expected progeny are 1�2 withthe genotype Ww and 1�2 with the geno-type ww. The former have the dominantphenotype because W is dominant over w,and the latter have the recessive pheno-type. A testcross is often extremely usefulin genetic analysis:

In a testcross, the phenotypes of the progenyreveal the relative frequencies of the different

Round � wrinkled seeds 193 round, 192 wrinkled 1.01 : 1

Yellow � green seeds 196 yellow, 189 green 1.04 : 1

Purple � white flowers 85 purple, 81 white 1.05 : 1

Long � short stems 87 long, 79 short 1.10 : 1

Table 3.2 Mendel’s testcross results

Testcross (F1 heterozygote Progeny � homozygous recessive) from testcross Ratio

Page 100: Introduction to Molecular Genetics and Genomics

Yellow

Green

Seed color phenotypes

Seed shape

phenotypes

Round

Wrinkled

9/16 Round, yellow

3/16 Round, green

3/16 Wrinkled, yellow

1/16 Wrinkled, green

Ratio of phenotypes in the F2 progeny of a dihybrid cross is 9 : 3 : 3 : 1.

3/4 1/4

3/4

1/4

Figure 3.8 The 3 : 1 ratio of round : wrinkled, when combined at random with the 3 : 1 ratio of yellow : green, yields the9 : 3 : 3 : 1 ratio observed in the F2 progeny of the dihybrid cross.

Ww Gg

1/2 W (and G or g)

1/2 w (and G or g)

1/4 WG

1/4 Wg

1/4 wG

1/4 wg

Segregation of W and w alleles

Independent segregation of G and g alleles

Result : An equal frequency of all four possible types of gametes

Figure 3.9 Independent segregation of the W, w and G, g allelepairs means that among each of the W and w gametic classes, theratio of G : g is 1 : 1. Likewise, among each of the G and ggametic classes, the ratio of W : w is 1 : 1.

Mendel perfomed this cross, he obtainedthe following numbers of F2 seeds:

Round, yellow 315Round, green 108Wrinkled, yellow 101Wrinkled, green 32

Total 5�5�6�

In these data, the first thing to be noted isthe expected 3 : 1 ratio for each trait consid-ered separately. The ratio of round : wrin-kled (pooling across yellow and green) is

(315 � 108) : (101 � 32)� 423 : 133� 3.18 : 1

And the ratio of yellow : green (poolingacross round and wrinkled) is

(315 � 101) : (108 � 32)� 416 : 140� 2.97 : 1

Both of these ratios are in satisfactoryagreement with 3 : 1. (Testing for goodnessof fit to a predicted ratio is described inChapter 4.) Furthermore, in the F2 progenyof the dihybrid cross, the separate 3 : 1 ra-tios for the two traits were combined atrandom. With random combinations, asshown in Figure 3.8, among the 3�4 of theprogeny that are round, 3�4 will be yellowand 1�4 green; similarly, among the 1�4 ofthe progeny that are wrinkled, 3�4 will beyellow and 1�4 green. The overall propor-tions of round yellow to round green towrinkled yellow to wrinkled green aretherefore expected to be

3�4 � 3�4 : 3�4 � 1�4 : 1�4 � 3�4 : 1�4 � 1�4� 9�16 : 3�16 : 3�16 : 1�16

The observed ratio of 315 : 108 : 101 : 32equals 9.84 : 3.38 : 3.16 : 1, which is a sat-isfactory fit to the 9 : 3 : 3 : 1 ratio expectedfrom the Punnett square in Figure 3.8.

The Principle of Independent AssortmentThe independent segregation of the W, wand G, g allele pairs is illustrated in Figure 3.9.What independence means is that if a ga-mete contains W, it is equally likely to con-tain G or g; and if a gamete contains w, it isequally likely to contain G or g. The impli-cation is that the four gametes are formedin equal frequencies:

1�4 W G 1�4 W g 1�4 w G 1�4 w g

3.3 Segregation of Two or More Genes 99

Page 101: Introduction to Molecular Genetics and Genomics

100 Chapter 3 Transmission Genetics: The Principle of Segregation

The result of independent assortmentwhen the four types of gametes combine atrandom to form the zygotes of the nextgeneration is shown in Figure 3.10. Note thatthe expected ratio of phenotypes amongthe F2 progeny is

9 : 3 : 3 : 1

However, as the Punnett square also shows,the ratio of genotypes in the F2 generationis more complex; it is

1 : 2 : 1 : 2 : 4 : 2 : 1 : 2 : 1

The reason for this ratio is shown inFigure 3.11. Among seeds that have the WWgenotype, the ratio of

GG : Gg : gg equals 1 : 2 : 1

Among seeds that have the Ww genotype,the ratio is

2 : 4 : 2

(which is a 1 : 2 : 1 ratio multiplied by 2because there are twice as many Ww geno-types as either WW or ww). And amongseeds that have the ww genotype, the ratio of

GG : Gg : gg equals 1 : 2 : 1

The phenotypes of the seeds are shown be-neath the genotypes, and the combinedphenotypic ratio is

9 : 3 : 3 : 1

Figure 3.11 also shows that among seedsthat are GG, the ratio of

WW : Ww : ww equals 1 : 2 : 1

among seeds that are Gg, it is

2 : 4 : 2

and among seeds that are gg, it is

1 : 2 : 1

Therefore, the independent segregationmeans that among each of the possiblegenotypes formed by one pair of alleles, theratio of homozygous dominant to heterozy-gous to homozygous recessive is 1 : 2 : 1 forthe other independent pair of alleles.

The principle of independent segrega-tion of two pairs of alleles in different chro-mosomes (or located sufficiently far apartin the same chromosome) has come to beknown as the principle of independent as-

Female gametes

Malegametes

+

ww gg

WG

WG

Wg

wG

wg

Wg wG wg

wgWG

Ww Gg

Double heterozygote

Genotypes

Phenotypes

1/41/4 1/41/4

1/4

1/4

1/4

1/4

Parents:

Gametes:

F1 progeny:

F2 progeny:

WW GG WW Gg

WW Gg WW gg

Ww GG Ww Gg

Ww Gg Ww gg

Ww GG Ww Gg

Ww Gg Ww gg

ww GG ww Gg

ww Gg ww gg

Round, yellow

Wrinkled, green

Round, yellow

WW GG

2/16 Ww GG 4/16 Ww Gg

1/16 WW GG 2/16 WW Gg +

1/16 ww GG 2/16 ww Gg

1/16 WW gg 2/16 Ww gg

+

+

+

1/16 ww gg

= 9/16 round, yellow

= 3/16 wrinkled, yellow

= 3/16 round, green

= 1/16 wrinkled, green

PhenotypesGenotypes

Figure 3.10 Independent segregation is the biological basis for the 9 : 3 : 3 : 1ratio of F2 phenotypes resulting from a dihybrid cross.

Au: Is phrase“independent segregation”correct in Fig. 3.10 caption, or should it be changed to“independentassortment”?

Page 102: Introduction to Molecular Genetics and Genomics

3.3 Segregation of Two or More Genes 101

WW GG WW Gg WW gg Ww GG Ww Gg Ww gg ww GG ww Gg ww gg

Segregation of

Gg within WWSegregation of

Gg within WwSegregation of

Gg within ww

1 2: : :1 2 1: : :2 14: : 2

Round, yellow

Round, greenAll genotypescombined

Wrinkled, yellow

Wrinkled, green

9

3

3

1

Figure 3.11 Genotypes and phenotypes of the F2 progeny of the dihybrid cross for seed shape andseed color.

Ww Gg

Gametes

Gametes

Ww Gg ww gg

Ww gg

ww Gg

ww gg

1/4 round, yellow= 1/4

1/4 round, green= 1/4

1/4 wrinkled, yellow= 1/4

1/4 wrinkled, green= 1/4

Parents:

WG

wg

Wg

wG

wg

All gametes fromhomozygous recessiveparent are wg.

Gametes fromheterozygous parentshow independentassortment.

Figure 3.12 Genotypes and phenotypes resulting from a testcross of theWw Gg double heterozygote.

sortment. It is also sometimes referred to asMendel’s second law.

The Principle of Independent Assortment:Segregation of the members of any pair of al-leles is independent of the segregation of otherpairs in the formation of reproductive cells.

Although the principle of independent as-sortment is fundamental in Mendelian ge-netics, the phenomenon of linkage, causedby proximity of genes in the same chromo-some, is an important exception.

The Testcross with Unlinked GenesGenes that show independent assortmentare said to be unlinked. The hypothesis ofindependent assortment can be tested di-rectly in a testcross with the double ho-mozygous recessive:

Ww Gg � ww gg

The result of the testcross is shown in Figure3.12. Because plants with doubly heterozy-gous genotypes produce four types of ga-metes—W G, W g, w G, and w g—in equalfrequencies, whereas the plants with ww gggenotypes produce only w g gametes, thepossible progeny genotypes are Ww Gg, Wwgg, ww Gg, and ww gg, and these are ex-pected in equal frequencies. Because of thedominance relations—W over w and G overg—the progeny phenotypes are expected tobe round yellow, round green, wrinkledyellow, and wrinkled green in a ratio of

1 : 1 : 1 : 1

As with the one-gene testcross, in a two-gene testcross the ratio of progeny pheno-types is a direct demonstration of the ratioof gametes produced by the doublyheterozygous parent. In the actual cross,Mendel obtained 55 round yellow, 51round green, 49 wrinkled yellow, and 53wrinkled green, which is in good agree-ment with the predicted 1 : 1 : 1 : 1 ratio.

Au: Proof asks ifit is OK that termin Key Terms listis “independentassortment.” Orshould term inlist be changed to“principle of in-dependent assort-ment”? Or shouldterm be madebold elsewhere intext?

Page 103: Introduction to Molecular Genetics and Genomics

102 Chapter 3 Transmission Genetics: The Principle of Segregation

Mendel tested this hypothesis, too, butby this time he was complaining of theamount of work the experiment entailed,noting that “of all the experiments, it re-quired the most time and effort.” The resultof the experiment is shown in Figure 3.14.From top to bottom, the three Punnettsquares show the segregation of W and wfrom G and g in the genotypes PP, Pp, andpp. In each box, the number in red is theexpected number of plants of each geno-type, assuming independent assortment,and the number in black is the observednumber of each genotype of plant. The ex-cellent agreement confirmed what Mendelregarded as the main conclusion of all of hisexperiments:

“Pea hybrids form germinal and pollen cellsthat in their composition correspond in equalnumbers to all the constant forms resultingfrom the combination of traits united throughfertilization.”

In this admittedly somewhat turgid sen-tence, Mendel incorporated both segrega-tion and independent assortment. Inmodern terms, what he means is that withindependent segregation, the gametes pro-duced by any hybrid plant consist of equalnumbers of all possible combinations of thealleles that are heterozygous. For example,

The Big ExperimentBy analogy with the independent segrega-tion of two genes illustrated in Figure 3.8,one might expect that independent segre-gation of three genes would produce F2progeny in which combinations of the phe-notypes are given by successive terms in themultiplication of [(3�4) � (1�4) ]3, which,when multiplied out, yields 27 : 9 : 9 : 9 : 3: 3 : 3 : 1. A specific example is shown inFigure 3.13. The allele pairs are W, w for roundversus wrinkled seeds; G, g for yellow ver-sus green seeds; and P, p for purple versuswhite flowers. (The alleles indicated withthe uppercase letters are dominant.) Withindependent segregation, among the F2progeny the most frequent phenotype(27�64) has the dominant form of all threetraits, the next most frequent (9�64) has thedominant form of two of the traits, the nextmost frequent (3�64) has the dominantform of only one trait, and the least fre-quent (1�64) is the triple recessive. Observethat if you consider any one of the traitsand ignore the other two, then the ratio ofphenotypes is 3 : 1; and if you consider anytwo of the traits, then the ratio of pheno-types is 9 : 3 : 3 : 1. This means that all ofthe possible one- and two-gene indepen-dent segregations are present in the overallthree-gene independent segregation.

W�

W�

W�

W�

ww

ww

ww

ww

G�

G�

gg

gg

G�

G�

gg

gg

P�

pp

P�

pp

P�

pp

P�

pp

269

98

86

27

88

34

30

7

270

90

90

30

90

30

30

10

Round, yellow, purple

48 : 16 = 3 : 1

For any one gene,the ratio of phenotypes is

36 : 12 : 12 : 4 = 9 : 3 : 3 : 1

For any pair of genes,the ratio of phenotypes is

Round, yellow, white

Round, green, purple

Round, green, white

Wrinkled, yellow, purple

Wrinkled, yellow, white

Wrinkled, green, purple

Wrinkled, green, white

Observednumber

Expectednumber

27/64

9/64

9/64

9/64

3/64

3/64

3/64

1/64

(3/4 W� ww)+ X1/4 (3/4 G� gg)+ 1/4 X (3/4 P� pp)+ 1/4

Figure 3.13 With independent assortment,the expected ratio of phenotypes in atrihybrid cross is obtained by multiplyingthe three independent 3 : 1 ratios of thedominant and recessive phenotypes. A dashused in a genotype symbol indicates thateither the dominant or the recessive allele ispresent; for example, W� refers collectivelyto the genotypes WW and Ww. (Theexpected numbers total 640 rather than 639because of round-off error.)

Page 104: Introduction to Molecular Genetics and Genomics

3.3 Segregation of Two or More Genes 103

GG

WWWw

wwGg

gg

GG

PP

Pp

pp

Gggg

WW

PP

Pp

pp

pp

pp

Pp

Pp

PP

PP

Ww

ww

GG

Gg

gg

WW

Ww

ww

GG

Gg

gg

WW

Ww

ww

10

20 14

10 8

20 15

40 49

20 19

10 9

20 20

10 10

8

20

40 38

20 25

40 45

80 78

40 36

20 17

40 40

20 20

22

10

20 18

10 10

20 18

40 48

20 24

10 11

20 16

10 7

14

Expected number Observed number

Figure 3.14 Progeny observed in the F2 genera-tion of a trihybrid cross with the allelic pairs W,w and G, g and P, p. In each box, the red entry isthe expected number and the black entry is the

the cross WW gg � ww GG produces F1progeny of genotype Ww Gg, which yieldsthe gametes W G, W g, w G, and w g in equalnumbers. Segregation is illustrated by the 1

: 1 ratio of W : w and G : g gametes, and in-dependent assortment is illustrated by theequal numbers of W G,W g, w G, and w ggametes.

observed. Note that each gene, by itself, yields a1 : 2 : 1 ratio of genotypes and that each pair ofgenes yields a 1 : 2 : 1 : 2 : 4 : 2 : 1 : 2 : 1 ratio ofgenotypes.

Page 105: Introduction to Molecular Genetics and Genomics

3.4 Human Pedigree AnalysisLarge deviations from expected genetic ra-tios are often found in individual humanfamilies and in domesticated large animalsbecause of the relatively small number ofprogeny. The effects of segregation are nev-ertheless evident upon examination of thephenotypes among several generations ofrelated individuals. A diagram of a familytree showing the phenotype of each indi-vidual among a group of relatives is apedigree. In this section we introduce ba-sic concepts in pedigree analysis.

Characteristics of Dominant and Recessive InheritanceFigure 3.15 defines the standard symbols usedin depicting human pedigrees. Females arerepresented by circles and males bysquares. (A diamond is used if the sex is un-known—as, for example, in a miscarriage.)Persons with the phenotype of interest areindicated by colored or shaded symbols. Forrecessive alleles, heterozygous carriers aredepicted with half-filled symbols. A matingbetween a female and a male is indicated byjoining their symbols with a horizontal line,which is connected vertically to a secondhorizontal line, below, that connects the

symbols for their offspring. The offspringwithin a sibship, called siblings or sibs, arerepresented from left to right in order oftheir birth.

A pedigree for the trait Huntington dis-ease, caused by a dominant mutation, isshown in Figure 3.16. The numbers in thepedigree are for convenience in referring toparticular persons. The successive genera-tions are designated by Roman numerals.Within any generation, all of the personsare numbered consecutively from left toright. The pedigree starts with the womanI-1 and the man I-2. The man has Hunting-ton disease, which is a progressive nervedegeneration that usually begins aboutmiddle age. It results in severe physical andmental disability and then death. The dom-inant allele, HD, that causes Huntingtondisease is rare. All affected persons in thepedigree have the heterozygous genotypeHD hd, whereas nonaffected persons havethe homozygous normal genotype hd hd.The disease has complete penetrance. Thepenetrance of a genetic disorder is theproportion of individuals with the at-riskgenotype who actually express the trait;complete penetrance means the trait is ex-pressed in 100 percent of persons with thatgenotype. The pedigree demonstrates thefollowing characteristic features of inheri-

104 Chapter 3 Transmission Genetics: The Principle of Segregation

Normal female Mating

Normal maleMating betweenrelatives

Parents and offspring(offspring depicted inorder of birth)

Two-egg(dizygotic)twins

One-egg(monozygotic)twins

Siblings

Sex unknown, normal

I

II

Female with phenotype of interest

Male with phenotype of interest

Sex unknown, with phenotype of interest

or Deceased+

Female heterozygous for recessive allele

Male heterozygous for recessive allele

Stillbirth or spontaneous abortion

Romannumeralsrepresentgeneration

Firstborn

Lastborn

Figure 3.15 Conventional symbols used in depicting human pedigrees.

Page 106: Introduction to Molecular Genetics and Genomics

tance due to a rare dominant allele withcomplete penetrance.

1. Females and males are equally likely to beaffected.

2. Affected offspring have one affected parent(except for rare new mutations), and the af-fected parent is equally likely to be themother or the father.

3. On average, half of the individuals in sib-ships with an affected parent are affected.

A pedigree for a trait due to a homo-zygous recessive allele is shown in Figure3.17. The trait is albinism, absence of pigmentin the skin, hair, and iris of the eyes. Thispedigree illustrates characteristics of inheri-tance due to a rare recessive allele withcomplete penetrance:

1. Females and males are equally likely to beaffected.

2. Affected individuals, if they reproduce, usu-ally have unaffected progeny.

3. Most affected individuals have unaffectedparents.

4. The parents of affected individuals are oftenrelatives.

5. Among siblings of affected individuals, theproportion affected is approximately 25percent.

With rare recessive inheritance, the matesof homozygous affected persons are usuallyhomozygous for the normal allele, so all ofthe offspring will be heterozygous and notaffected. Heterozygous carriers of the mu-tant allele are considerably more commonthan homozygous affected individuals,because it is more likely that a person willinherit only one copy of a rare mutant al-lele than two copies. Most homozygousrecessive genotypes therefore result from

3.4 Human Pedigree Analysis 105

I

II

III

IV

4 5

3 4 5

2 31

1 2

2 3 4 5 6 7 8

Homozygousrecessive

Heterozygous

Mating betweenfirst cousins

One of thesepersons is heterozygous

1 2

1

Figure 3.17 Pedigree of albinism. With recessive inheritance, affectedpersons (filled symbols) often have unaffected parents. The doublehorizontal line indicates a mating between relatives—in this case, firstcousins.

I

1 2

1 2 3 13 14 154 5 6 7 8 9 10 11 12

3 4 6 75II

III

Nonaffected personshave genotype hd hdbecause hd is recessive

Affected personshave genotype HD hdbecause the HD alleleis very rare

1 2

Figure 3.16 Pedigree of a humanfamily showing the inheritanceof the dominant gene forHuntington disease. Femalesand males are represented bycircles and squares, respec-tively. Red symbols indicatepersons affected with thedisease.

Captioncaptioncaptioncaptioncaptioncaptoncaptioncaptioncaptioncaptioncaptioncap-tioncaptioncaptioncaptioncaptioncaption-captioncaptioncaption

09131_01_1686_P

Page 107: Introduction to Molecular Genetics and Genomics

matings between carriers (heterozygous �heterozygous), in which each offspring hasa 1�4 chance of being affected. Another im-portant feature of rare recessive inheritanceis that the parents of affected individuals areoften related (consanguineous). A matingbetween relatives is indicated with a doubleline connecting the partners, as for the first-cousin mating in Figure 3.17. Mating be-tween relatives is important for recessivealleles to become homozygous, becausewhen a recessive allele is rare, it is morelikely to become homozygous through in-heritance from a common ancestor thanfrom parents who are completely unre-lated. The reason is that the carrier of a rareallele may have many descendants who arealso carriers. If two of these carriers shouldmate with each other (for example, in afirst-cousin mating), then the hidden reces-sive allele can become homozygous with aprobability of 1�4. Mating between relativesconstitutes inbreeding, and the conse-quences of inbreeding are discussed furtherin Chapter 17. Because an affected individ-ual indicates that the parents are hetero-zygous carriers, the expected proportion ofaffected individuals among the siblings isapproximately 25 percent, but the exactvalue depends on the details of how af-fected individuals are identified and in-cluded in the database.

Molecular Markers in Human PedigreesBefore the advent of molecular methods,there were many practical obstacles to thestudy of human genetics.

• Most genes that cause genetic diseases arerare, so they are observed in only a smallnumber of families.

• Many genes of interest in human genetics arerecessive, so they are not detected in het-erozygous genotypes.

• The number of offspring per human family isrelatively small, so segregation cannot usu-ally be detected in single sibships.

• The human geneticist cannot perform test-crosses or backcrosses, because human mat-ings are not manipulated by anexperimenter.

Because techniques for manipulating DNAallow direct access to the DNA, modern ge-netic studies of human pedigrees are car-ried out primarily using genetic markerspresent in the DNA itself, rather thanthrough the phenotypes produced by mu-tant genes. The human genome includesone single-nucleotide polymorphism per500–3000 bp, depending on the region be-ing studied. This means that two randomlychosen human genomes differ at 1–6 mil-lion nucleotide positions.

Various types of DNA polymorphismswere discussed in Chapter 2, along with themethods by which they are detected andstudied. An example of a DNA polymor-phism segregating in a three-generationhuman pedigree is shown in Figure 3.18. Thetype of polymorphism is a simple tandemrepeat polymorphism (STRP), in which eachallele differs in size according to the num-ber of copies it contains of a short DNAsequence repeated in tandem. The differ-ences in size are detected by electrophoresisafter amplification of the region by PCR.STRP markers usually have as many as 20codominant alleles, and the majority of in-

106 Chapter 3 Transmission Genetics: The Principle of Segregation

A4A5

I

II

IIIA4A6

A4A6

A1A4 A1A6 A3A4 A1A6 A3A6 A3A4 A1A4

A1A3

A3A4A1A2

1234

Position of DNAfragment in gel

56

123456

DNA bands areobserved in gel.

Figure 3.18 Humanpedigree showingsegregation of VNTRalleles. Six alleles(A1–A6) are present in the pedigree, butany one person canhave only one allele(if homozygous) ortwo alleles (if hetero-zygous).

Page 108: Introduction to Molecular Genetics and Genomics

dividuals are heterozygous for two differentalleles. In the example in Figure 3.18, eachof the parents is heterozygous, as are all ofthe children. More than 5000 geneticmarkers of this type have been identified inthe human genome, each heterozygous inan average of 70 percent of individualgenotypes.

Six alleles are depicted in Figure 3.18,denoted by A1 through A6. In the gel, thenumbers of the bands correspond to thesubscripts of the alleles. The mating in gen-eration II is between two heterozygousgenotypes: A4A6 � A1A3. Because of segre-gation in each parent, four genotypes arepossible among the offspring (A4A1, A4A3,A6A1, and A6A3); these would convention-ally be written with the smaller subscriptfirst, as A1A4, A3A4, A1A6, and A3A6. Withrandom fertilization the offspring geno-types are equally likely, as may be verifiedfrom a Punnett square for the mating.Figure 3.18 illustrates some of the principaladvantages of multiple, codominant allelesfor human pedigree analysis: (1) Hetero-zygous genotypes can be distinguishedfrom homozygous genotypes. (2) Many in-dividuals in the population are heterozy-gous, and so many matings are informativein regard to segregation. (3) Each segregat-ing genetic marker yields up to four distin-guishable offspring genotypes.

3.5 Pedigrees and ProbabilityGenetic analysis benefits from large num-bers of progeny because these numbers de-termine the degree to which observed ratiosof genotypes and phenotypes fit their ex-pected values. Greater statistical variationoccurs in smaller numbers—and thereforepotentially greater deviations from the ex-pected values. For example, among theseeds of individual pea plants in which a 3 :1 ratio was expected, Mendel observed ra-tios ranging from 1.85 : 1 to 4.85 : 1. Thelarge variation results from the relativelysmall number of seeds per plant, which av-eraged about 34 in these experiments.Taken together, the seeds yielded a pheno-typic ratio of 3.08 : 1, which not only fits 3 :1 but is actually a better fit than that ob-served in any of the individual plants. Be-cause of statistical variation in small

numbers, a working knowledge of probabil-ity is basic to understanding genetic trans-mission. In the first place, each event offertilization represents a chance combina-tion of alleles present in the parental ga-metes. In the second place, the proportionsof the different types of offspring obtainedfrom a cross are the cumulative result of nu-merous independent events of fertilization.

In the analysis of genetic crosses, theprobability of a particular outcome of a fer-tilization event may be considered as equiv-alent to the proportion of times that thisoutcome is expected to be realized in nu-merous repeated trials. The reverse is alsotrue: The proportion of times that an out-come is expected to be realized in numer-ous repeated trials is equivalent to theprobability that it is realized in a single trial.To take a specific example using the pedi-gree in Figure 3.18, the mating at the upperleft is A4A5 � A4A6. Among a large numberof offspring from such a mating, the ex-pected proportion of A4A6 genotypes is 1�4.Equivalently, we could say that any off-spring whose genotype is unknown has aprobability of genotype A4A6 equal to 1�4.However, the genotype of the female off-spring shown is already given as A4A6, sorelative to this individual, the situation hasbecome a certainty: The probability that hergenotype is A4A6 equals 1, and the probabil-ity that her genotype is anything elseequals 0.

To evaluate the probability of a geneticevent usually requires an understanding ofthe mechanism of inheritance and knowl-edge of the particular cross. For example, inasserting that the mating I-1 � I-2 in Fig-ure 3.18 has a probability of yielding anA4A6 offspring equal to 1�4 , we needed totake the parental genotypes into account. Ifthe mating were A4A6 � A4A6, then theprobability of an A4A6 offspring would be1�2 rather than 1�4; if the mating wereA4A4 � A6A6, then the probability of anA4A6 offspring would be 1.

In many genetic crosses, the possibleoutcomes of fertilization are equally likely.Suppose that there are n possible outcomes,each as likely as any other, and that in m ofthese, a particular outcome of interest isrealized; then the probability of the out-come of interest is m�n. In the language ofprobability, a possible outcome of interest istypically called an event. As an example,

3.5 Pedigrees and Probability 107

Page 109: Introduction to Molecular Genetics and Genomics

consider again the mating A4A5 � A4A6. Ifwe were concerned with the exact specifi-cation of each genotype, then we woulddistinguish among all of the four possiblegenotypes of progeny (A4A4, A4A6, A4A5,and A5A6) and assert that each is equallylikely because it has a probability of 1�4. Onthe other hand, if we were concerned onlywith homozygosity or heterozygosity, thenthere would be only two possible outcomes,

homozygous (A4A4) and heterozygous(A4A6, A4A5, and A5A6), which have proba-bilities 1�4 and 3�4, respectively. Or again, ifwe were interested only in whether or notan offspring carries the A5 allele, then theprobabilities would be 1�2 for “yes” and 1�2for “no.” These examples illustrate thepoint that offspring of a genetic cross can beclassified or grouped together in differentways, depending on the particular question

108 Chapter 3 Transmission Genetics: The Principle of Segregation

Troubadour

The Huntington’s DiseaseCollaborative Research Group 1993Comprising 58 authors among 9 institutionsA Novel Gene Containing a TrinucleotideRepeat That Is Expanded and Unstableon Huntington’s Disease Chromosomes

Modern genetic research is sometimes car-ried out by large collaborative groups in anumber of research institutions scatteredacross several countries. This approach isexemplified by the search for the gene re-sponsible for Huntington disease. Thesearch was highly publicized because ofthe severity of the disease, the late age ofonset, and the dominant inheritance.Famed folk singer Woody Guthrie, whowrote “This Land Is Your Land” and other

well-known tunes, died of the disease in1967. When the gene was identified, itturned out to encode a protein (now calledhuntington) of unknown function that isexpressed in many cell types throughoutthe body and not, as expected, exclusivelyin nervous tissue. Within the coding se-quence of this gene is a trinucleotide re-peat (5'-CAG-3') that is repeated intandem a number of times according tothe general formula (5'-CAG-3')n. Amongnormal alleles, the number n of repeatsranges from 11 to 34 with an average of 18;among mutant alleles, the number of re-peats ranges from 40 to 86. This tandemrepeat is genetically unstable in that itcan, by some unknown mechanism, in-crease in copy number (“expand”). In twocases in which a new mutant allele wasanalyzed, one had increased in repeatnumber from 36 to 44 and the other from

33 to 49. This is a mutational mechanismthat is quite common in some human ge-netic diseases. The excerpt cites severalother examples. The authors also empha-size that their discovery raises importantethical issues, including genetic testing,confidentiality, and informed consent.

Huntington disease (HD) is a progres-sive neurodegenerative disorder charac-terized by motor disturbance, cognitiveloss, and psychiatric manifestations. Itis inherited in an autosomal dominantfashion and affects approximately 1 in10,000 individuals in most populationsof European origin. The hallmark of HDis a distinctive choreic [jerky] movementdisorder that typically has a subtle, insid-ious onset in the fourth to fifth decade oflife and gradually worsens over a courseof 10 to 20 years until death. . . . The

Au: In abstract, the word in the phrase “now calledhuntingtin” changed to “huntington”. OK?

Captioncaptioncaptioncaptioncaptioncap-tioncaptioncaption-captioncaptioncaptioncaptioncaptioncap-tioncaptioncaption-captioncaptioncaptioncaptioncaptioncap-tioncaptioncaption 09131_01_1687P 09131_01_1688P

Page 110: Introduction to Molecular Genetics and Genomics

of interest, and different types of groupingsmay have different probabilities associatedwith them.

Mutually Exclusive PossibilitiesSometimes an outcome of interest includestwo or more different possibilities. The off-spring of the mating A4A5 � A4A6 againprovides a convenient example. As we haveseen, if the offspring are classified as eitherhomozygous or heterozygous, then theoutcome “homozygous” consists of just onepossibility (A4A4) and the outcome “het-erozygous” consists of three possibilities(A4A6, A4A5, and A5A6). Furthermore, if anindividual genotype is any one of the threeheterozygous genotypes, it cannot at thesame time be any of the other heterozygousgenotypes. In other words, only one geno-type can be realized in any one organism,so the realization of one possibility pre-cludes the realization of another in thesame organism. Events that exclude eachother in this manner are said to be mutuallyexclusive. When events are mutually exclu-

sive, their probabilities are combined ac-cording to the addition rule.

Addition Rule: The probability of therealization of one or the other of two mutuallyexclusive events, A or B, is the sum of theirseparate probabilities.

In symbols, if we use Prob to mean probabil-ity, then the addition rule for two possibleoutcomes is written

Prob {A or B} � Prob {A} � Prob {B}

Application of the addition rule is straight-forward in the above examples of homo-zygous versus heterozygous genotypes, butthere are three possible outcomes instead oftwo. Because the offspring genotypes areequally likely, the probability of a hetero-zygous offspring equals Prob {A4A6, A4A5, orA5A6} � Prob {A4A6} � Prob {A4A5} �Prob {A5A6} � 1�4 � 1�4 � 1�4 � 3�4.Because 3�4 is the probability of an indi-vidual offspring being heterozygous, it isalso the expected proportion of heterozy-gous genotypes among a large number ofprogeny.

genetic defect causing HD was assignedto chromosome 4 in one of the first suc-cessful linkage analyses usingDNA markers in humans. Sincethat time, we have pursued anapproach to isolating and char-acterizing the HD gene basedon progressively refining its lo-calization. . . . [We have foundthat a] 500-kb segment is themost likely site of the geneticdefect. [The abbreviation kb stands forkilobase pairs; 1 kb equals 1000 basepairs.] Within this region, we have identi-fied a large gene, spanning approxi-mately 210 kb, that encodes a previouslyundescribed protein. The reading framecontains a polymorphic (CAG)n trinu-cleotide repeat with at least 17 alleles inthe normal population, varying from 11to 34 CAG copies. On HD chromo-

somes, the length of the trinucleotide re-peat is substantially increased. . . . Elon-

gation of a trinucleotide repeat se-quence has been implicated previouslyas the cause of three quite different hu-man disorders, the fragile-X syndrome,myotonic dystrophy, and spino-bulbarmuscular atrophy. . . . It can be expectedthat the capacity to monitor directly thesize of the trinucleotide repeat in indi-viduals “at risk” for HD will revolution-ize testing for the disorder. . . . We con-

sider it of the utmost importance thatthe current internationally accepted

guidelines and counseling pro-tocols for testing people at riskcontinue to be observed, andthat samples from unaffectedrelatives should not be testedinadvertently or without fullconsent. . . . With the mysteryof the genetic basis of HD ap-parently solved, [it opens] the

next challenges in the effort to under-stand and to treat this devastating disor-der.

Source: Cell 72: 971–983

We consider it of the utmost importance that thecurrent internationally accepted guidelines andcounseling protocols for testing people at risk

continue to be observed, and that samples fromunaffected relatives should not be tested

inadvertently or without full consent.

3.5 Pedigrees and Probability 109

Page 111: Introduction to Molecular Genetics and Genomics

Independent PossibilitiesEvents that are not mutually exclusive maybe independent, which means that the real-ization of one outcome has no influence onthe possible realization of any others. Wehave already seen an example in the inde-pendent segregation of round versus wrin-kled as against yellow versus green peas(Figure 3.9). The principle is that when thepossible outcomes of an experiment or ob-servation are independent, the probabilitythat they are realized together is obtainedby multiplication.

Successive offspring from a cross arealso independent events, which means thatthe genotypes of early progeny have no in-fluence on the probabilities in later progeny(Figure 3.19). The independence of successiveoffspring contradicts the widespread beliefthat in each human family, the ratio of girlsto boys must “even out” at approximately 1: 1. According to this reasoning, a familywith four girls would be more likely to have

a boy the next time around. But this beliefis supported neither by theory nor by actualdata on the sex ratio in human sibships.The data indicate that human families areequally likely to have a girl or a boy on anybirth, irrespective of the sex distribution inprevious births. Although statistics guaran-tees that the sex ratio will balance outwhen averaged over a very large number ofsibships, this does not imply that it willequalize in any individual sibship. To beconcrete, among families in which there arefive children, sibships consisting of fiveboys balance those consisting of five girls,yielding an overall sex ratio of 1 : 1; never-theless, both of these sibships have an un-usual sex ratio.

When events are independent (such asindependent traits or successive offspringfrom a cross), the probabilities are com-bined by means of the multiplication rule.

Multiplication Rule: The probability of twoindependent events, A and B, being realizedsimultaneously is given by the product of theirseparate probabilities.

In symbols, the multiplication rule is written

Prob {A and B} � Prob {A} � Prob {B}

The multiplication rule can be used to an-swer questions such as this: For two off-spring from the mating A4A5 � A4A6, whatis the probability of one homozygous geno-type and one heterozygous genotype? Thebirth order homozygous–heterozygous hasprobability 1�4 � 3�4, and the birth orderheterozygous–homozygous has probability3�4 � 1�4. These possibilities are mutuallyexclusive, so the overall probability is

1�4 � 3�4 � 3�4 � 1�4 �

2(1�4)(3�4) � 3�8

Note that the addition rule was used twicein solving this problem, once to calculatethe probability of a heterozygous offspringand again to calculate the probability ofboth birth orders. The multiplication rulecan also be used to calculate the probabilityof a specific genotype among the progenyof a complex cross. For example, if aquadruple heterozygous genotype Aa Bb CcDd is self-fertilized, what is the probabilityof a quadruple heterozygous offspring, AaBb Cc Dd? Assuming independent assort-ment, the answer is (1�2)(1�2)(1�2)(1�2) �(1�2)4 � 1�16, because each of the inde-

110 Chapter 3 Transmission Genetics: The Principle of Segregation

Segregation of A1A2 is independentof segregation ofB1B2; the probabilitiesmultiply, and so thegametes are:

A1A2

B1B2

A11/2

A21/2

B11/2

B21/2

A1B11/4

A1B21/4

A2B11/4

A2B21/4

(A)

(B)

Each offspring resultsfrom an independentevent of fertilization.

Successive offspring in a human sibship (or peas in a pod) are independent, and so the probabilities of genotypes or phenotypes can be multiplied.

Figure 3.19 In genetics, two important types of indepen-dence are (A) independent segregation of alleles thatshow independent assortment and (B) independentfertilizations resulting in successive offspring. In thesecases, the probabilities of the individual outcomes ofsegregation or fertilization are multiplied to obtain theoverall probability.

Page 112: Introduction to Molecular Genetics and Genomics

3.6 Incomplete Dominance and Epistasis 111

Red Ivory

Pink

1/2 Pink 1/4 Ivory1/4 Red

All red All ivory

1/2 Pink 1/4 Ivory1/4 Red

Parents:

F1:

F2:

F3:

II

Ii

II

II

II Ii ii

ii

Ii ii

ii

Incomplete dominance;heterozygous genotypeis intermediate in color

Self-fertilization

Figure 3.20 Red versus white flower color in snapdragons shows nodominance.

pendent genes yields a heterozygous off-spring with probability 1�2.

3.6 Incomplete Dominance and Epistasis

Dominance and codominance are not theonly possibilities for pairs of alleles. Thereare situations of incomplete dominance,in which the phenotype of the heterozy-gous genotype is intermediate between thephenotypes of the homozygous genotypes.A classic example of incomplete dominanceconcerns flower color in the snapdragonAntirrhinum majus (Figure 3.20). In wildtypeflowers, a red type of anthocyanin pigmentis formed by a sequence of enzymatic reac-tions. A wildtype enzyme, encoded by the Iallele, is limiting to the rate of the overallreaction, so the amount of red pigment isdetermined by the amount of enzyme thatthe I allele produces. The alternative i allelecodes for an inactive enzyme, and ii flowersare ivory in color. Because the amount ofthe critical enzyme is reduced in Ii het-erozygotes, the amount of red pigment inthe flowers is reduced also, and the effect ofthe dilution is to make the flowers pink. Across between plants differing in flowercolor therefore gives direct phenotypic evi-dence of segregation (Figure 3.20). Thecross II (red) � ii (ivory) yields F1 plantswith genotype Ii and pink flowers. In the F2progeny obtained by self-pollination of theF1 hybrids, one experiment resulted in 22plants with red flowers, 52 with pink flow-ers, and 23 with ivory flowers, which fitsthe expected ratio of 1 : 2 : 1.

Multiple AllelesThe occurrence of multiple alleles isexemplified by the alleles A1–A6 of the STRPmarker in the human pedigree in Figure3.18. Multiple alleles are relatively com-mon in natural populations and, as in thisexample, can be detected most easily bymolecular methods. In the DNA of a gene,each nucleotide can be A, T, G, or C, so agene of n nucleotides can theoretically mu-tate at any of the positions to any of thethree other nucleotides. The number ofpossible single-nucleotide differences in a

gene of length n is therefore 3 � n. If n �5000, for example, there are potentially15,000 alleles (not counting any of the pos-sibilities with more than one nucleotidesubstitution). Most of the potential allelesdo not actually exist at any one time. Someare absent because they did not occur, oth-ers did occur but were eliminated by chanceor because they were harmful, and still oth-ers are present but at such a low frequencythat they remain undetected. Nevertheless,at the level of DNA sequence, most genes in

Page 113: Introduction to Molecular Genetics and Genomics

most natural populations have multiple al-leles, all of which can be considered “wild-type.” Multiple wildtype alleles are usefulin such applications as DNA typing becausetwo unrelated people are unlikely to havethe same genotype, especially if several dif-ferent loci, each with multiple alleles, areexamined. Many harmful mutations alsoexist in multiple forms. Recall from Chapter1 that more than 400 mutant forms of thephenylalanine hydroxylase gene have beenidentified in patients with phenylketonuria.The alleles A1–A6 in Figure 3.18 also illus-trate that although a population of organ-isms may contain any number of alleles,any particular organism or cell may carryno more than two, and any gamete maycarry no more than one.

In some cases, the multiple alleles of agene exist merely by chance and reflect thehistory of mutations that have taken placein the population and the dissemination ofthese mutations among population sub-groups by migration and interbreeding. Inother cases, there are biological mecha-nisms that favor the maintenance of a largenumber of alleles. For example, genes thatcontrol self-sterility in certain floweringplants can have large numbers of allelictypes. This type of self-sterility is found inspecies of red clover that grow wild in manypastures. The self-sterility genes preventself-fertilization because a pollen grain canundergo pollen tube growth and fertiliza-tion only if it contains a self-sterility alleledifferent from either of the alleles presentin the flower on which it lands. In otherwords, a pollen grain containing an allelealready present in a flower will not func-tion on that flower. Because all pollengrains produced by a plant must containone of the self-sterility alleles present in theplant, pollen cannot function on the sameplant that produced it, and self-fertilizationcannot take place. Under these conditions,any plant with a new allele has a selectiveadvantage, because pollen that contains thenew allele can fertilize all flowers exceptthose on the same plant. Through evolu-tion, populations of red clover have accu-mulated hundreds of alleles of theself-sterility gene, many of which havebeen isolated and their DNA sequences de-termined. Many of the alleles differ at mul-tiple nucleotide sites, which implies thatthe alleles in the population are very old.

Human ABO Blood GroupsIn a multiple allelic series, there may be dif-ferent dominance relationships betweendifferent pairs of alleles. An example isfound in the human ABO blood groups,which are determined by three alleles de-noted IA, IB, and IO. (Actually, there are twoslightly different variants of the IA allele.)The blood group of any person may be A, B,AB, or O, depending on the type of polysac-charide (polymer of sugars) present on thesurface of red blood cells. One of two differ-ent polysaccharides, A or B, can be formedfrom a precursor molecule that is modifiedby the enzyme product of the IA or the IB al-lele. The gene product is a glycosyl trans-ferase enzyme that attaches a sugar unit tothe precursor (Figure 3.21). The IA or the IB

alleles encode different forms of the en-zyme with replacements at four amino acidsites; these alter the substrate specificity sothat each enzyme attaches a different sugar.People of genotype IAIA produce red bloodcells having only the A polysaccharide andare said to have blood type A. Those ofgenotype IBIB have red blood cells withonly the B polysaccharide and have bloodtype B. Heterozygous IAIB people have redcells with both the A and the B polysaccha-rides and have blood type AB. The third al-lele, IO, encodes an enzymatically inactiveprotein that leaves the precursor un-changed; neither the A nor the B type ofpolysaccharide is produced. HomozygousIOIO persons therefore lack both the A andthe B polysaccharides and are said to haveblood type O.

In this multiple allelic series, the IA andIB alleles are codominant: The heterozygousgenotype has the characteristics of both ho-mozygous genotypes—the presence of boththe A and the B carbohydrate on the redblood cells. On the other hand, the IO alleleis recessive to both IA and IB. Hence, het-erozygous IAIO genotypes produce the Apolysaccharide and have blood type A, andheterozygous IBIO genotypes produce the Bpolysaccharide and have blood type B. Thegenotypes and phenotypes of the ABOblood group system are summarized in thefirst three columns of Table 3.3.

ABO blood groups are important inmedicine because of the need for bloodtransfusions. A crucial feature of the ABOsystem is that most human blood contains

112 Chapter 3 Transmission Genetics: The Principle of Segregation

Page 114: Introduction to Molecular Genetics and Genomics

3.6 Incomplete Dominance and Epistasis 113

Precursor carbohydrate

N-acetylgalactosamine

CH2OH

NHCOCH3

HOH H

H

HO

H

H

OH

O

Fucose

CH3

H

H

OH

HHO

H

H

OH

HO

O

N-acetylglucosamine

Key:CH2OH

NHCOCH3

HOH H

H

HO

H H

OH

O

Galactose

CH2OH

HOH

OH

H

H

HO

H

H

OH

O

OO

O

HO

O

OO

O

HO

O

OO

OO

O

OO

OO

O

O

O

O

O

IB-encodedtransferase

IA-encodedtransferase

IO-encodedtransferase

This is the “H”antigen; it reactswith neither anti-Anor anti-B antibody.

This is the A antigen;it reacts withanti-A antibody.

This is the B antigen;it reacts withanti-B antibody.

Galactose addedto precursor

N-acetylgalactosamineadded to precursor

OO

O

O

Figure 3.21 The ABOantigens on thesurface of human redblood cells arecarbohydrates. Theyare formed from aprecursor carbohy-drate by the action oftransferase enzymesencoded by alleles ofthe I gene. Allele IO

codes for an inactiveenzyme and leavesthe precursor (calledthe H substance)unmodified. The IA

allele encodes anenzyme that adds N-acetylgalactosamine(purple) to theprecursor. The IB

allele encodes anenzyme that addsgalactose (green) tothe precursor. Theother colored sugarunits are N-acetylglu-cosamine (orange)and fucose (yellow).

IAIA A Type A Anti-B A & O A & AB

IAIO A Type A Anti-B A & O A & AB

IBIB B Type B Anti-A B & O B & AB

IBIO B Type B Anti-A B & O B & AB

I AIB A & B Type AB Neither anti-A A, B, AB & O AB onlynor anti-B

IOIO Neither A nor B Type O Anti-A & anti-B O only A, B, AB & O

Table 3.3 Genetic control of the human ABO blood groups

Blood types that Blood types thatAntigens present ABO blood goup Antibodies present can be tolerated can accept blood

Genotype on red blood cells phenotype in blood fluid in transfusion for transfusion

antibodies to either the A or the B polysac-charide. An antibody is a protein made bythe immune system in response to a stimu-lating molecule called an antigen and is ca-pable of binding to the antigen. Anantibody is usually specific in that it recog-

nizes only one antigen. Some antibodiescombine with antigen and form large mole-cular aggregates that may precipitate.

Antibodies act to defend against invad-ing viruses and bacteria. Although antibod-ies do not normally form without prior

Page 115: Introduction to Molecular Genetics and Genomics

stimulation by the antigen, people capableof producing anti-A and anti-B antibodiesdo produce them. Production of these anti-bodies may be stimulated by antigens thatare similar to polysaccharides A and B andthat are present on the surfaces of manycommon bacteria. However, a mechanismcalled tolerance prevents an organism fromproducing antibodies against its own anti-gens. This mechanism ensures that A anti-gen or B antigen elicits antibody productiononly in people whose own red blood cellsdo not contain A or B, respectively. The endresult:

People of blood type O make both anti-A andanti-B antibodies, those of blood type A makeanti-B antibodies, those of blood type B makeanti-A antibodies, and those of blood type ABmake neither type of antibody.

The antibodies found in the blood fluidof people with each of the ABO blood typesare shown in the fourth column in Table3.3. The clinical significance of the ABOblood groups is that transfusion of bloodcontaining A or B red-cell antigens intopersons who make antibodies against themresults in an agglutination reaction inwhich the donor red blood cells areclumped. In this reaction, the anti-A anti-body agglutinates red blood cells of eitherblood type A or blood type AB, becauseboth carry the A antigen (Figure 3.22). Simi-larly, anti-B antibody agglutinates red bloodcells of either blood type B or blood typeAB. When the blood cells agglutinate,many blood vessels are blocked, and the re-cipient of the transfusion goes into shockand may die. Incompatibility in the other

114 Chapter 3 Transmission Genetics: The Principle of Segregation

Red blood cellsin type-A person

A antigen

B antigen

+

+

Red blood cellsin type-B person

Red blood cellsin type-AB person

Anti-Aantibody

Anti-Aantibody

Anti-Aantibody

Antibody causesagglutination of type-A red bloodcells

Antibody causesagglutination of type-AB red bloodcells

No agglutination oftype-B red blood cells

+

Figure 3.22 Antibodyagainst type-Aantigen agglutinatesred blood cells thatcarry the type-Aantigen, whether ornot they also carrythe type-B antigen.Blood fluid contain-ing anti-A antibodyagglutinates redblood cells of type Aand type AB, but notred blood cells of typeB or type O.

Page 116: Introduction to Molecular Genetics and Genomics

direction, in which the donor blood con-tains antibodies against the recipient’s redblood cells, is usually acceptable becausethe donor’s antibodies are diluted so rapidlythat clumping is avoided. The types of com-patible blood transfusions are shown in thelast two columns of Table 3.3. Note that aperson of blood type AB can receive bloodfrom a person of any other ABO type; typeAB is called a universal recipient. Conversely,a person of blood type O can donate bloodto a person of any ABO type; type O iscalled a universal donor.

EpistasisIn Chapter 1 we saw how the products ofseveral genes may be necessary to carry outall the steps in a biochemical pathway. Ingenetic crosses in which two mutations thataffect different steps in a single pathway areboth segregating, the typical F2 ratio of 9 : 3: 3 : 1 is not observed. Gene interaction thatperturbs the normal Mendelian ratios isknown as epistasis. One type of epistasis isillustrated by the interaction of the C, c andP, p allele pairs affecting flower coloration inpeas. These genes encode enzymes in thebiochemical pathway for the synthesis ofanthocyanin pigment, and the productionof anthocyanin requires the presence of atleast one wildtype dominant allele of eachgene. The proper way to represent this situ-ation genetically is to write the requiredgenotype as

C� P�

where each dash is a “blank” that may befilled with either allele of the gene. HenceC� comprises the genotypes CC and Cc, andlikewise P� comprises the genotypes PPand Pp. All four genotypes included in thesymbol C� P�, and only these genotypes,have purple flowers.

Figure 3.23 shows a cross between the ho-mozygous recessive genotypes pp and cc.The phenotype of the flowers in the F1 gen-eration is purple because the genotype is CcPp. Self-fertilization of the F1 plants (indi-cated by the encircled cross sign) results inthe F2 progeny genotypes shown in thePunnett square. Because only the C� P�progeny have purple flowers, the ratio ofpurple flowers to white flowers in the F2generation is 9 : 7. The epistasis does not

3.6 Incomplete Dominance and Epistasis 115

Female

gametes

CP

Cp

cP

Male gametes

CP Cp cP cp

cp

CC PP CC Pp Cc PP Cc Pp

CC Pp CC pp Cc Pp Cc pp

Cc PP Cc Pp cc PP cc Pp

Cc Pp Cc pp cc Pp cc pp

F2 generation:

: 7 whiteF2 ratio: 9 purple

Double heterozygote Cc Pp(complementation is observed)

Encircled X means self-fertilization

Homozygousmutant pp

Homozygousmutant cc

Parents:

F1 generation:

CC pp cc PP

Figure 3.23 Epistasis in the determination offlower color in peas. Formation of the purplepigment requires the dominant allele of boththe C and P genes. With this type of epistasis,the dihybrid F2 ratio is modified to 9 purple :7 white.

change the result of independent segrega-tion, it merely conceals the fact that the un-derlying ratio of the genotypes C� P� : C�pp : cc P� : cc pp is 9 : 3 : 3 : 1.

For a trait determined by the interactionof two genes, each with a dominant allele,

Page 117: Introduction to Molecular Genetics and Genomics

there are only a limited number of ways inwhich the 9 : 3 : 3 : 1 dihybrid ratio can bemodified. The possibilities are illustrated inFigure 3.24. In part A are the genotypes pro-duced in the F2 generation by independentsegregation. In the absence of epistasis, theF2 ratio of phenotypes is 9 : 3 : 3 : 1. Thepossible modified ratios are shown in partB. In each row, the color coding indicatesphenotypes that are indistinguishable be-cause of epistasis, and the resulting modi-fied ratio is given. For example, in themodified ratio at the bottom, the pheno-types of the “3 : 3 : 1” classes are indistin-guishable, resulting in a 9 : 7 ratio. This isthe ratio observed in the segregation of theC, c and P, p alleles in Figure 3.23, with its 9 :7 ratio of purple to white flowers. Taking allthe possible modified ratios in Figure 3.24B

together, there are nine possible dihybridratios when both genes show completedominance. Examples of each of the modi-fied ratios are known. The most frequentlyencountered modified ratios are 9 : 7, 12 : 3: 1, 13 : 3 , 9 : 4 : 3, and 9 : 6 : 1. The typesof epistasis that result in these modified ra-tios are illustrated in the following exam-ples, which are taken from a variety oforganisms. Other examples can be found inthe problems at the end of the chapter.

9 : 7 is observed when a homozygous re-cessive mutation in either or both of twodifferent genes results in the same mutantphenotype, as in Figure 3.23.

12 : 3 : 1 results when the presence of adominant allele at one locus masks the

116 Chapter 3 Transmission Genetics: The Principle of Segregation

Unmodified ratio

9 : 3 : 3 : 1

12 : 3 : 1

10 : 3 : 3

9 : 6 : 1

9 : 4 : 3

15 : 1

13 : 3

12 : 4

10 : 6

9 : 7

Modified F2 ratio

9 3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3 1

1

1

1

1

1

1

1

1

1

1 2 2 4 1 2 1 2 1

9

9

9

9

9

9

9

9

9

AA BB AA Bb Aa BB Aa Bb AA bb Aa bb aa BB aa bbaa Bb

Color below shows phenotypic expression

(B)

(A)

Figure 3.24 Modified F2 dihybrid ratios. (A) The F2 genotypes of two independently assorting geneswith complete dominance result in a 9 : 3 : 3 : 1 ratio of phenotypes, provided there is no interac-tion (epistasis) between the genes. (B) If there is epistasis that renders two or more of thephenotypes indistinguishable (indicated by the colors), then the F2 ratio is modified.

Page 118: Introduction to Molecular Genetics and Genomics

genotype at a different locus, such as theA� genotype rendering the B� and bbgenotypes indistinguishable. For example,in genetic study of the color of the hull inoat seeds, a variety with white hulls wascrossed with a variety with black hulls. TheF1 hybrid seeds had black hulls. Among 560progeny in the F2 generation, the hull phe-notypes observed were 418 black, 106 gray,and 36 white. The ratio of phenotypes is11.6 : 3.9 : 1, or very nearly 12 : 3 : 1. A ge-netic hypothesis to explain these results isthat the black-hull phenotype is due to thepresence of a dominant allele A and thegray-hull phenotype is due to anotherdominant allele B whose effect is apparentonly in aa genotypes. On the basis of thishypothesis, the original varieties had geno-types aa bb (white) and AA BB (black). TheF1 has genotype Aa Bb (black). If the A, a al-lele pair and the B, b allele pair undergo in-dependent assortment, then the F2generation is expected to have the geno-typic and phenotypic composition 9�16 A�B� (black hull), 3�16 A� bb (black hull),3�16 aa B� (gray hull), 1�16 aa bb (whitehull), or 12 : 3 : 1.

13 : 3 is illustrated by the difference be-tween White Leghorn chickens (genotypeCC II) and White Wyandotte chickens(genotype cc ii). Both breeds have whitefeathers because the C allele is necessary forcolored feathers, but the I allele in WhiteLeghorns is a dominant inhibitor of feathercoloration. The F1 generation of a dihybridcross between these breeds has the geno-type Cc Ii, which is expressed as whitefeathers because of the inhibitory effects ofthe I allele. In the F2 generation, only theC� ii genotype has colored feathers, sothere is a 13 : 3 ratio of white : colored.

9 : 4 : 3 is observed when homozygosityfor a recessive allele with respect to onegene masks the expression of the genotypeof a different gene. For example, if the aagenotype has the same phenotype regard-less of whether the genotype is B� or bb,then the 9 : 4 : 3 ratio results. As an exam-ple, in mice the grayish “agouti” coat colorresults from a horizontal band of yellowpigment just beneath the tip of each hair.The agouti pattern is due to a dominant al-lele A, and in aa animals the coat color is

black. A second dominant allele, C, is neces-sary for the formation of hair pigments ofany kind, and cc animals are albino (white).In a cross of AA CC (agouti) � aa cc (al-bino), the F1 progeny are Aa Cc and pheno-typically agouti. Crosses between F1 malesand females produce F2 progeny in the pro-portions 9�16 A� C� (agouti), 3�16 A� cc(albino), 3�16 aa C� (black), 1�16 aa cc (al-bino), or 9 agouti : 4 albino : 3 black.

9 : 6 : 1 implies that homozygosity for ei-ther of two recessive alleles yields the samephenotype but that the phenotypeof the double homozygote is dif-ferent. In Duroc–Jersey pigs, redcoat color requires the presence oftwo dominant alleles R and S. Pigsof genotype R� ss and rr S� havesandy-colored coats, and rr ss pigsare white. The F2 ratio is therefore9�16 R� S� (red), 3�16 R� ss(sandy), 3�16 rr S� (sandy), 1�16 rr ss(white), or 9 red : 6 sandy : 1 white.

3.7 Genetic Analysis: Mutant Screens and theComplementation Test

When a geneticist makes a statement suchas “Flower color in peas is determined by asingle gene with dominant allele P and re-cessive allele p,” this does not mean thatonly one gene is needed for flower color.Implicit in the statement is the existence oftwo strains or varieties, one with coloredflowers and one with white flowers, inwhich the difference is caused by one strainhaving genotype PP and the other pp.Many genes beside P are also necessary forpurple flower coloration. Among them areother genes in the biochemical pathway forthe synthesis of anthocyanin, as well as dif-ferent genes needed for expression of thebiochemical pathway in developing flow-ers. A third variety of peas that has whiteflowers may not have the genotype pp. Inthis variety, the white flowers could becaused by a mutation in one of the othergenes needed for flower color. We have al-ready seen an example in Figure 3.23, inwhich the white flower at the upper right

3.7 Genetic Analysis: Mutant Screens and the Complementation Test 117

09131_01_1749P

Page 119: Introduction to Molecular Genetics and Genomics

results from genotype cc. Furthermore, ageneticist interested in understanding thegenetic basis of flower color would rarelybe satisfied with having identified the Pgene alone or both the P gene and the Cgene. The ultimate goal of a genetic analy-sis of flower color would be, by isolatingnew mutations, to identify every gene nec-essary for purple coloration and then,through further study of the mutant phe-notypes, to determine the normal functionof each of the genes that affect the trait.

The Complementation Test in Gene IdentificationIn a genetic analysis of flower color, a ge-neticist would begin by isolating many newmutants with white flowers. Although mu-tations are usually very rare, their fre-quency can be increased by treatment withradiation or certain chemicals. The isolationof a set of mutants, all of which show thesame type of phenotype, is called a mutantscreen. Among the mutants that are iso-lated, some will contain mutations in genesalready identified. For example, a geneticanalysis of flower color in peas might yieldone or more new mutations that changedthe wildtype P allele into a recessive allele

that blocks the formation of the purple pig-ment. Each of these mutations might differin DNA sequence, forming a set of multiplealleles that are all mutant forms of the wild-type P allele. On the other hand, a mutantscreen should also yield mutations in genesnot previously identified. Each of the newgenes might be also represented by multiplemutant alleles.

In a mutant screen for flower color, mostof the new mutations will be recessive. Thisis because most of the new mutant alleleswill encode an inactive protein of somekind, and most protein-coding genes are ex-pressed at such a level that one copy of thewildtype allele is sufficient to yield a normalmorphological phenotype. (The molecularphenotype is often intermediate; the het-erozygotes have half as much protein as thewildtype homozygotes.) New recessive alle-les are therefore identified in homozygousgenotypes, because homozygous recessiveplants have white flowers. Among all thenew mutant strains that are isolated, somewill have a recessive mutation in one gene,others in a sceond gene, still others in athird gene, and so on. Complicating the is-sue are multiple alleles, because two ormore independently isolated mutationsmay be alleles of the same wildtype gene.How can the strains be sorted out? How canthe geneticist determine which mutationsare alleles of the same gene and which aremutations in different genes?

Because the new mutations are reces-sive, all one has to do is cross the homozy-gous genotypes. The phenotype of the F1progeny yields the answer. This principle isillustrated in Figure 3.25. In part A we sup-pose that the mutations are recessive allelesof the same gene. Let us call them a1 and a2.Then the parental mutant strains havegenotypes a1a1 and a2a2, and the F1 progenyhave the genotype a1a2. Because there is nowildtype allele at the locus, the phenotypeof the F1 is white (mutant). This result iscalled noncomplementation, and itmeans that the parental strains are homozy-gous for recessive alleles of the same gene.

In part B we suppose the parental mu-tant strains are homozygous for recessivealleles of different genes, say a1a1 and b1b1.With respect to the B locus, the genotype ofthe a1a1 strain is BB, and with respect to theA locus, the genotype of the b1b1 strain is

118 Chapter 3 Transmission Genetics: The Principle of Segregation

Photocaption photocaptionphotocaptionphoto-captionphotocaptionphotocaptionphotocaption-photocaptionphotocaptionphotocaptionphotocaptionphotocaptionphotocaptionphotocaption-photocaptionphotocaptionphotocaption

09131_01_1718P

Page 120: Introduction to Molecular Genetics and Genomics

AA; hence the genotypes of the mutantstrains could be written more completely asa1a1 BB and AA b1b1. In this case, the F1progeny have the genotype Aa1 Bb1. The Aallele masks the mutation a1 and the B al-lele masks the mutation b1; hence the phe-notype is purple (wildtype). This result iscalled complementation, and it meansthat the parental strains are homozygousfor recessive alleles of different genes.

The kind of cross illustrated in Figure3.25 is a complementation test. As wehave seen, it is used to determine whetherrecessive mutations in each of two differentstrains are alleles of the same gene. Be-cause the result indicates the presence orabsence of allelism, the complementationtest is one of the key experimental opera-tions in genetics. To illustrate the applica-tion of the test in practice, suppose amutant screen were carried out to isolatemore mutations for white flowers. Startingwith a true-breeding strain with purpleflowers, we treat pollen with x rays and usethe irradiated pollen to fertilize ovules toobtain seeds. The F1 seeds are grown and

the resulting plants allowed to self-fertilize,after which the F2 plants are grown. A fewof the F1 seeds may contain a new recessivemutation, but in this generation the geno-type is heterozygous. Self-fertilization ofsuch plants results in F2 progeny in a ratioof 3 purple : 1 white. Because mutationsresulting in a particular phenotype arequite rare, even when induced by radia-tion, only a few among many thousands ofself-fertilized plants will be found to have anew white-flower mutation. Let us sup-pose that we were lucky enough to obtainsix new mutations.

How are we going to name these muta-tions? We can make no assumptions aboutthe number of genes represented. As far aswe know at this stage, they could all be al-leles of the same gene, or they could all bealleles of different genes. This is what weneed to establish using the complementa-tion test. For the moment, then, let us as-sign the mutations arbitrary names inseries—x1, x2, x3, and so forth—with no im-plication about which may be alleles ofeach other.

3.7 Genetic Analysis: Mutant Screens and the Complementation Test 119

Mutant strain 1

Purple flowers White flowers

Homozygousrecessivemutation 1(pp)

Homozygousrecessivemutation 2

Homozygousrecessivemutation 1(pp)

Homozygousrecessivemutation 3

Mutant strain 2 Mutant strain 1 Mutant strain 3

The wildtype phenotypeindicates complementation.

Mutation 1 and mutation 2are defects in different genes.

The mutant phenotypeindicates noncomplementation.

Mutation 1 and mutation 3are defects in the same gene.

(A) (B)

P1

generation

F1

generation

Figure 3.25 The complementation test reveals whether two recessive mutations are alleles of thesame gene. In the complementation test, homozygous recessive genotypes are crossed. If thephenotype of the F1 progeny is mutant (A), it means that the mutations in the parental strains arealleles of same gene. If the phenotype of the F1 progeny is nonmutant (B), it means that themutations in the parental strains are alleles of different genes.

FPO

Page 121: Introduction to Molecular Genetics and Genomics

The next step is to classify the mutationsinto groups using the complementation test.Figure 3.26 shows the results, conventionallyreported in a triangular array of � and �signs. The crosses that yield F1 progeny withthe wildtype phenotype (in this case, purpleflowers) are denoted with a � sign in thecorresponding box, whereas those that yieldF1 progeny with the mutant phenotype(white flowers) are denoted with a � sign.The � signs indicate complementation be-tween the mutant alleles in the parents, andthe � signs indicate noncomplementation.(The bottom half of the triangle is unneces-sary because reciprocal crosses yield thesame results, and the diagonal elements areunnecessary because each strain is true-breeding within itself and so yields mutantprogeny.) As we saw in Figure 3.25, com-plementation in a cross means that theparental strains have mutations in differentgenes. Lack of complementation means thatthe parental mutations are in the same

gene. The following principle underlies thecomplementation test.

The Principle of Complementation:If two recessive mutations are alleles of the same gene, then a cross between homo-zygous strains yields F1 progeny that aremutant (noncomplementation); if they arealleles of different genes, then the F1 progenyare wildtype (complementation).

In interpreting complementation data suchas those in Figure 3.26, we actually applythe principle the other way around. We ex-amine the phenotype of the F1 progeny ofeach possible cross and then infer whetheror not the parental strains have mutant al-leles of the same gene.

In a complementation test, if the combinationof two recessive mutations results in a mutantphenotype, then the mutations are regarded asalleles of the same gene; if the combination re-sults in a wildtype phenotype, then the muta-tions are regarded as alleles of different genes.

120 Chapter 3 Transmission Genetics: The Principle of Segregation

Genotype of

male parent

Genotype of female parent

+ + + – +

+ – + –

+ + +

+ –

+

x3x3x2x2 x5x5

x1x1

x2x2

x5 x5

x6x6

Complementation(mutations not allelic)

Lack ofcomplementation(mutations allelic)

Phenotypeof F1 progeny

x4x4

x3x3

x4x4

+

+

+ +

++

– –

Figure 3.26 Results of complementation tests among six mutant strains of peas, each homozygousfor a recessive allele resulting in white flowers. Each box gives the phenotype of the F1 progeny of across between the male parent whose genotype is indicated in the far left column and the femaleparent whose genotype is indicated in the top row.

Page 122: Introduction to Molecular Genetics and Genomics

A convenient way to analyze the data inFigure 3.26 is to arrange the alleles in a cir-cle as shown in Figure 3.27A. Then, for eachpossible pair of mutations, connect the pairby a straight line if the mutations fail tocomplement (Figure 3.27B). According tothe principle of complementation, the linesmust connect mutations that are alleles ofeach other, because in a complementationtest, lack of complementation means thatthe mutations are alleles. Each of the groupsof noncomplementing mutations is called acomplementation group. As we haveseen, each complementation group definesa gene, so the complementation test pro-vides the geneticist’s technical definition:

A gene is defined experimentally as a set ofmutations that make up a single complemen-tation group. Any pair of mutations within thegroup fail to complement one another, and thecomplementation test yields organisms with amutant phenotype.

The mutations in Figure 3.27 therefore rep-resent three genes, mutation of any one ofwhich results in white flowers. Mutationsx1 and x5 are alleles of one gene; x3 is an al-lele of a different gene; and x2, x4, and x6are alleles of a third gene.

What about the genes P and C (Figure3.23 on page 115) that also affect flowercolor? The complementation test tells usnothing about these, because we have notyet included them in any of the crosses. Totest allelism with P and C, we would have tocross one mutant strain from each comple-mentation group with strains of genotypepp and cc. If the F1 progeny are mutant, itidentifies the complementation group withan already known mutation. For example,suppose we cross each of the strains x1x1,x3x3, and x4x4 with pp and cc. Suppose fur-ther that the progeny are all wildtype ex-cept in two crosses—x1x1 � pp and x3x3 �cc—in which the progeny are mutant. Thisresult implies that the complementationgroup consisting of x1 and x5 also includes pand that the complementation group con-sisting of x3 also includes c.

At this point in the genetic analysis, it isadvisable to rename the mutations to indi-cate which ones are true alleles. (There isan old Chinese saying that the correct nam-ing of things is the beginning of wisdom,and this is certainly true in the case ofgenes.) Because the p allele already had its

name before the mutation screen was car-ried out, the new alleles of p (that is, x1 andx5) should be renamed to reflect their al-lelism with p. Old names have priority, andso we must use the symbol p embellishedwith some sort of identifier. We might, forexample, rename the x1 and x5 mutationsp2 and p3 to indicate that they were the sec-ond and third mutations of P to be discov-ered and to convey their independentorigins. Similarly, we might rename x3 (thenew allele of c) as c2. The remaining com-plementation group consisting of x2, x4, andx6 is a new one, and we can name it arbi-trarily. For example, we might call the locusalbus (Latin for “white”) with the gene sym-bol alb and call the alleles alb1, alb2, andalb3. The wildtype dominant allele of alb,which is necessary for purple coloration,would then be represented as Alb or as alb�.The procedure of sorting new mutationsinto complementation groups and renam-ing them according to their allelism is anexample of how geneticists identify genesand name alleles. Such renaming of allelesis the typical manner in which genetic ter-minology evolves as knowledge advances.

3.7 Genetic Analysis: Mutant Screens and the Complementation Test 121

x3 complements all theother alleles; it represents another gene that affects flower coloration.

x1

x6

x3

x2

x1

x5

x6

x3

x2

x4

This connecting line means that x1 and x5 fail to complement one another when the parentsare crossed; they are alleles of the same gene.

These connecting lines mean that x2, x4, and x6 fail to complement one another in all combinations in which the parents are crossed; they are alleles of yet another gene.

x5

x4

(A) (B)

Figure 3.27 A method for interpreting the results of complementationtests. (A) Arrange the mutations in a circle. (B) Connect by a straight lineany pair of mutations that fail to complement (that yield a mutantphenotype); any pair of mutations so connected are alleles of the samegene. In this example, there are three complementation groups, each ofwhich represents a single gene needed for purple flower coloration.

Page 123: Introduction to Molecular Genetics and Genomics

Why Does the Complementation Test Work? The molecular basis of complementation isillustrated in Figure 3.28. Part A depicts thesituation when two recessive mutations, x1and x2, each resulting in white flowers, aredifferent mutations in the same gene. (Thesite of each mutation is indicated by a“burst.”) The cross x1x1 � x2x2 producesan F1 hybrid in which x1 and x2 are in ho-mologous DNA molecules; x1 encodes aprotein with one type of defect, and x2encodes a protein with a different type ofdefect, but both types of protein are non-functional. Hence alleles in the same geneyield a mutant phenotype (white flowers),because neither mutation encodes a wild-type form of the protein.

When the mutations are alleles of dif-ferent genes, the situation is as depicted inFigure 3.28B. Because the mutations arein different genes, the homozygous x1strain is also homozygous for the wildtypeallele x2� of the second locus; likewise, thehomozygous x2 strain is also homozygousfor the wildtype allele x1� of the first lo-cus. Hence, the same cross that yields thegenotype x1x2 in the case of allelic muta-tions (Figure 3.28A) yields x1�x1 x2�x2 inthe case of different genes. This is a doubleheterozygous genotype. Because the muta-tions are both recessive and in differentgenes, they do complement each otherand yield an organism with a wildtypephenotype (purple flowers). With respectto the protein rendered defective by x1,there is a functional form encoded by the

122 Chapter 3 Transmission Genetics: The Principle of Segregation

x1

x2

(A) Trans heterozygote for two mutations in the same gene

Mutant geneproduct(nonfunctional)

Site ofmutation in gene

Boundaries of gene

Mutant (white) flower color Wildtype (purple) flower color

Mutant geneproduct(nonfunctional)

x1

x1+

(B) Trans heterozygote for two mutations in different genes

x2+

x2

Mutant geneproduct(nonfunctional)

Normal geneproduct(functional)

Mutant geneproduct(nonfunctional)

Normal geneproduct(functional)

Result: No complementation.No functional gene product, therefore mutant phenotype.

Result: Complementation. Functional product from both genes, therefore wildtype phenotype.

Boundaries of gene 1 Boundaries of gene 2

Figure 3.28 Molecular interpretation of a complementation test used in determining whether twomutations are alleles of the same gene (A) or alleles of different genes (B).

Page 124: Introduction to Molecular Genetics and Genomics

Chapter Summary 123

Chapter SummaryMendelian genetics deals with the hereditary transmis-sion of genes from one generation to the next. One keyprinciple is segregation, in which the two alleles in an in-dividual separate during the formation of gametes so thateach gamete is equally likely to contain either member ofthe pair. In a cross such as AA � aa, in which only onegene is considered, the genotype of the offspring (consti-tuting the F1 generation) is heterozygous Aa. The pheno-type of the F1 progeny depends on the dominancerelationships among the alleles. For many morphologicaltraits, the wildtype allele, here denoted A, is dominant,and the phenotype of heterozygous Aa is indistinguish-able from that of homozygous AA. In contrast, the allelesof molecular genetic markers are often codominant, andthe phenotypes of AA, Aa, and aa are all distinct. For anRFLP marker, for example, the homozygous AA and aagenotypes have a phenotype consisting of a single banddiffering in electrophoretic mobility, whereas the het-erozygous Aa genotype has a phenotype consisting ofboth bands.

In the formation of gametes, an Aa genotype pro-duces A-bearing and a-bearing gametes in equal propor-tions. Hence in the F1 � F1 cross Aa � Aa, assumingrandom union of gametes in fertilization, the progeny(the F2 generation) are expected to consist of genotypesAA : Aa : aa in the proportions 1 : 2 : 1. The distributionof phenotypes in the F2 generation again depends on thedominance relationships. If A is dominant to a, then theF2 ratio of dominant : recessive phenotypes is expected tobe 3 : 1. With codominance all three genotypes are distin-guishable, and the ratio of F2 phenotypes is 1 : 2 : 1.

For crosses involving two genes—for example, AA BB� aa bb—the F1 genotype is Aa Bb. Segregation of eachgene implies that the ratios of A : a and of B : b gametesare both 1 : 1. If the genes are unlinked, they undergo in-dependent segregation (independent assortment), andthe gametic types A B : A b : a B : a b are formed in the ra-tio 1 : 1 : 1 : 1. Hence the F2 generation formed from thecross F1 � F1 is expected to have genotypes given by theproduct of the expression (1�4 AA � 1�2 Aa � 1�4 aa)� (1�4 BB � 1�2 Bb � 1�4 bb). Using a dash to repre-sent an allele of unspecified type, we can write the F2genotypes as 9 A� B� : 3 A� bb : 3 aa B� : 1 aa bb, and ifboth A and B are dominant, the phenotypic ratio in the F2generation is 9 : 3 : 3 : 1. This ratio can be modified invarious ways by interaction between the genes (epista-sis). Different types of epistasis may result in dihybrid ra-tios such as 9 : 7 or 12 : 3 : 1 or 13 : 3 or 9 : 4 : 3.

The rules of probability provide the basis for predictingthe outcomes of genetic crosses based on the principles of

segregation and independent assortment. Two basic rulesfor combining probabilities are the addition rule and themultiplication rule. The addition rule applies to mutuallyexclusive events; it states that the probability of the real-ization of either one or the other of two events equals thesum of the respective probabilities. The multiplication ruleapplies to independent events; it states that the probabilityof the simultaneous realization of both of two events isequal to the product of the respective probabilities.

In some organisms, including human beings, it is notpossible to perform controlled crosses, and genetic analy-sis is accomplished through the study of pedigrees throughtwo or more generations. Pedigree analysis is the determi-nation of the possible genotypes of the family members ina pedigree and of the probability that an individual mem-ber has a particular genotype. The goal of pedigree analy-sis is often to infer the genetic basis of an inherited diseaseor other condition—for example, to determine whether itmay be due to a simple dominant or recessive allele. Al-though most morphological traits do not show simpleMendelian patterns of inheritance in pedigrees, molecularmarkers usually do. Prominent among these are SNPs(single-nucleotide polymorphisms), RFLPs (restrictionfragment length polymorphisms), and STRPs (simple tan-dem repeat polymorphisms).

Multiple alleles are often encountered in natural pop-ulations or as a result of mutant screens. Genes with mul-tiple alleles may have as few as three, such as those forthe ABO blood groups, to as many as a hundred or more.Examples of large numbers of alleles include the genesused in DNA typing and the self-sterility alleles in someflowering plants. Although there may be multiple allelesin a population, each gamete can carry only one allele ofeach gene, and each organism can carry at most two dif-ferent alleles of each gene.

The complementation test is the functional definitionof a gene. If a cross between two homozygous recessivegenotypes results in nonmutant progeny, the mutationsare said to complement one another. Complementation isevidence that the alleles are mutations in different genes.On the other hand, if a cross between two homozygousrecessives results in mutant progeny, then the allelesshow noncomplementation (they fail to complement). Non-complementation is evidence that the mutations are al-leles of the same gene. For any group of recessivemutations, a complete complementation test entailscrossing the homozygous recessives in all pairwise combi-nations. At the molecular level, lack of complementationimplies that allelic mutations impair the function of thesame protein molecule.

wildtype allele brought in from the x2x2parent. With respect to the protein ren-dered defective by x2, there is again afunctional form encoded by the wildtype

allele brought in from the x1x1 parent. Be-cause a functional form of both proteins isproduced, the result is a normal pheno-type, or complementation.

Page 125: Introduction to Molecular Genetics and Genomics

124 Chapter 3 Transmission Genetics: The Principle of Segregation

Review the Basics• What is the principle of segregation, and how is this

principle demonstrated in the results of a single-gene(monohybrid) cross?

• What is the principle of independent assortment, andhow is this principle demonstrated in the results of atwo-gene (dihybrid) cross?

• Explain why random union of male and female ga-metes is necessary for Mendelian segregation and in-dependent assortment to be observed in the progenyof a cross.

• What is the difference between mutually exclusiveevents and independent events? How are the proba-bilities of these two types of events combined? Givetwo examples of genetic events that are mutually ex-clusive and two examples of genetic events that areindependent.

• When two pairs of alleles show independent assort-ment, under what conditions will a 9 : 3 : 3 : 1 ratio ofphenotypes in the F2 generation not be observed?

• Explain the following statement: “Among the F2progeny of a dihybrid cross, the ratio of genotypes is 1: 2 : 1, but among the progeny that express the domi-nant phenotype, the ratio of genotypes is 1 : 2.”

• What are the principal features of human pedigrees inwhich a rare dominant allele is segregating? In whicha rare recessive allele is segregating?

• What is a mutant screen and how is it used in geneticanalysis?

• Explain the statement: “In genetics, a gene is identi-fied experimentally by a set of mutant alleles that failto show complementation.” What is complemen-tation? How does a complementation test enable ageneticist to determine whether two different muta-tions are or are not mutations in the same gene?

• What does it mean to say that epistasis results in a“modified dihybrid F2 ratio?” Give two examples of amodified dihybrid F2 ratio, and explain the gene in-teractions that result in the modified ratio.

Guide to Problem SolvingProblem 1 Explain what each of the following ratios repre-sents in the F2 progeny of a single-gene (monohybrid)cross. Which are ratios of genotypes and which are ratiosof phenotypes? (a) 3 : 1(b) 1 : 2 : 1(c) 2 : 1

Answer (a) 3 : 1 is the expected ratio of phenotypes whenone of the alleles is dominant. (b) 1 : 2 : 1 is the expectedratio of genotypes AA : Aa : aa. (c) 2 : 1 is the expected ra-

tio of heterozygous Aa genotypes to homozygous AAgenotypes.

Problem 2 Complete the table by inserting 0, 1, 1�2, or 1�4for the probability of each genotype of progeny from eachtype of mating. For which mating are the parents identi-cal in genotype but the progeny as variable in genotype asthey can be for a single locus? For which mating are theparents as different as they can be for a single locus butthe progeny identical to each other and different from ei-ther parent?

Key Termsaddition ruleantibodyantigenbackcrosscarriercodominancecomplementationcomplementation groupcomplementation testconsanguineous matingepistasisF1 generationF2 generationgamete

hybridincomplete dominanceindependent assortmentMendelian geneticsmultiple allelesmultiplication rulemutant screennoncomplementationoutcrossingP1 generationpedigreepenetrancePunnett squarereciprocal cross

segregationsibsiblingtestcrosstransmission geneticstransposable elementtrue-breedingwildtypeunlinked genes

Page 126: Introduction to Molecular Genetics and Genomics

Guide to Problem Solving 125

Answer (a) Follow the strategy outlined in the text, inwhich the alleles are placed in the shape of a circle andpairs of alleles that fail to show complementation are con-nected by a line. The resulting pattern is a visual repre-sentation of the complementation groups.

(b) The mutants define three complementation groups(“genes”). One complementation group consists of thealleles a, e, and f; another of the alleles b and c; and theother of the allele d only.

Problem 4 The accompanying illustration shows four al-ternative types of combs in chickens; they are called rose,pea, single, and walnut. The following data summarizethe results of crosses. The rose and pea strains used incrosses 1, 2, and 5 are true breeding.

1. rose � single � rose2. pea � single � pea3. (rose � single) F1 � (rose � single) F1 �

3 rose : 1 single4. (pea � single) F1 � (pea � single) F1 �

3 pea : 1 single5. rose � pea � walnut6. (rose � pea) F1 � (rose � pea) F1 �

9 walnut : 3 rose : 3 pea : 1 single

(a) What genetic hypothesis can explain these results?(b) What are the genotypes of parents and progeny in

each of the crosses?(c) What are the genotypes of true-breeding strains of

rose, pea, single, and walnut?

Rose comb

Single comb

Pea comb

Walnut comb

a

c

b

e

f

d

Progeny genotypes

Mating AA Aa aa

AA � AA 1 0 0

AA � Aa 1/2 1/2 0

AA � aa 0 1 0

Aa � Aa 1/4 1/2 1/4

Aa � aa 0 1/2 1/2

aa � aa 0 0 1

Progeny genotypes

Mating AA Aa aa

AA � AA

AA � Aa

AA � aa

Aa � Aa

Aa � aa

aa � aa

Answer This table is to transmission genetics what themultiplication table is to arithmetic. It is fundamental tobeing able to solve almost any type of quantitative prob-lem in transmission genetics. It needs to be thoroughlyunderstood—memorized, if necessary!

The mating for which the parents are identical in geno-type, but the progeny are as variable in genotype as theycan be for a single locus, is Aa � Aa. The mating for whichthe parents are as different as they can be for a single lo-cus, but the progeny are identical to each other and dif-ferent from either parent, is AA � aa.

Problem 3 The data in the accompanying complementa-tion matrix summarize the result of crosses between mu-tants designated a through f.(a) Configure the allele symbols in the form of a circle,

and use straight lines to connect the alleles that are inthe same complementation group.

(b) State the conclusion in words: How many comple-mentation groups are indicated, and which mutantsare in each complementation group?

– + + + – –

– – + + +

– + + +

– + +

– –

a

a b c d e f

b

c

d

e

f

Page 127: Introduction to Molecular Genetics and Genomics

126 Chapter 3 Transmission Genetics: The Principle of Segregation

Answer (a) Cross 6 gives the Mendelian ratios expectedwhen two genes are segregating, so a genetic hypothesiswith two genes is necessary. Crosses 1 and 3 give the re-sults expected if rose comb were due to a dominant allele(say, R). Crosses 2 and 4 give the results expected if peacomb were due to a dominant allele (say, P). Cross 5 indi-cates that walnut comb results from the interaction of Rand P. The segregation in cross 6 means that R and P arenot alleles of the same gene.

(b) 1. RR pp � rr pp � Rr pp2. rr PP � rr pp � rr Pp3. Rr pp � Rr pp � 3�4 R� pp : 1�4 rr pp

Analysis and Applications3.1 What gametes can be formed by an individual organ-ism of genotype Aa? Of genotype Bb? Of genotype Aa Bb?

3.2 How many different gametes can be formed by an or-ganism with genotype AA Bb Cc Dd Ee and, in general, byan organism that is heterozygous for m genes and ho-mozygous for n genes?

3.3 Round pea seeds are planted that were obtained fromthe F2 generation of a cross between a true-breedingstrain with round seeds and a true-breeding strain withwrinkled seeds. The pollen was collected and used enmasse to fertilize plants from the true-breeding wrinkledstrain. What fraction of the progeny is expected to havewrinkled seeds?

3.4 Phenylketonuria is a recessive inborn error of metab-olism of the amino acid phenylalanine that results in se-vere mental retardation of affected children. The femaleII-3 (red circle) in the pedigree shown here is affected. Ifpersons III-1 and III-2 (they are first cousins) mate, whatis the probability that their offspring will be affected?(Assume that persons II-1 and II-5 are homozygous forthe normal allele.)

3.5 Shown above right are a pedigree and gel diagramindicating the clinical phenotypes with respect tophenylketonuria and the molecular phenotypes with re-spect to an RFLP that overlaps the PAH gene for pheny-lalanine hydroxylase. Individual II-3 is affected.

(a) Indicate the expected molecular phenotype of II-3.

(b) Indicate the possible molecular phenotypes of II-4.

I

1 2

1

4 53II

III

1 2

2

3.6 Assuming equal numbers of boys and girls, if a mat-ing has already produced a girl, what is the probabilitythat the next child will be a boy? If a mating has alreadyproduced two girls, what is the probability that the nextchild will be a boy? On what type of probability argumentdo you base your answers?

3.7 Assuming equal sex ratios, what is the probabilitythat a sibship of four children consists entirely of boys? Ofall boys or all girls? Of equal numbers of boys and girls?

3.8 In the following questions, you are asked to deducethe genotype of certain parents in a pedigree. The pheno-types are determined by dominant and recessive alleles ofa single gene.

(a) A homozygous recessive results from the mating of aheterozygote and a parent with the dominant pheno-type? What does this tell you about the genotype ofthe parent with the dominant phenotype?

(b) Two parents with the dominant phenotype producenine offspring. Two have the recessive phenotype.What does this tell you about the genotype of theparents?

(c) One parent has a dominant phenotype, and the otherhas a recessive phenotype. Two offspring result, andboth have the dominant phenotype. What genotypesare possible for the parent with the dominant pheno-type?

II-1 II-2 II-3 II-4

I-1 I-2

12 kb9 kb

6 kb

3 kb

1 kb

4. rr Pp � rr Pp � 3�4 rr P� : 1�4 rr pp5. RR pp � rr PP � Rr Pp6. Rr Pp � Rr Pp �

9�16 R� P� : 3�16 R� pp : 3�16 rr P� : 1�16 rr pp(c) The true-breeding genotypes are RR pp (rose), rr PP(pea), rr pp (single), and RR PP (walnut).

Au: Should genotype inprob. 3.2 be “Aa Bb . . .” instead of “AA Bb . . .”?

Page 128: Introduction to Molecular Genetics and Genomics

Analysis and Applications 127

3.9 Pedigree analysis tells you that a particular parentmay have the genotype AA BB or AA Bb, each with thesame probability. Assuming independent assortment,what is the probability of this parent’s producing an Abgamete? What is the probability of the parent’s producingan AB gamete?

3.10 Assume that the trihybrid cross AA BB rr � aa bb RRis made in a plant species in which A and B are dominantbut there is no dominance between R and r. Consider theF2 progeny from this cross, and assume independent as-sortment.

(a) How many phenotypic classes are expected?

(b) What is the probability of the parental aa bb RR geno-type?

(c) What proportion would be expected to be homozy-gous for all three genes?

3.11 In the cross Aa Bb Cc Dd � Aa Bb Cc Dd, in which allgenes undergo independent assortment, what proportionof offspring are expected to be heterozygous for all fourgenes?

3.12 The pattern of coat coloration in dogs is determinedby the alleles of a single gene, with S (solid) being domi-nant over s (spotted). Black coat color is determined bythe dominant allele A of a second gene, tan by homozy-gosity for the recessive allele a. A female having a solidtan coat is mated with a male having a solid black coatand produces a litter of six pups. The phenotypes of thepups are 2 solid tan, 2 solid black, 1 spotted tan, and 1spotted black. What are the genotypes of the parents?

3.13 In the human pedigree shown here, the daughterindicated by the red circle (II-1) has a form of deafnessdetermined by a recessive allele. What is the probability

you can learn about genetic testing programs, early symp-toms, and the time course of the disease.

• The red and purple colors of flowers, as well as of autumnleaves, result from members of a class of pigments called an-thocyanins. The biochemical pathway for anthocyanin synthe-sis in the snapdragon, Antirrhinum majus, can be found at thiskeyword site. The enzyme responsible for the first step in thepathway (chalcone synthase) limits the amount of pigmentformed, which explains why red and white flowers in Antir-rhinum show incomplete dominance.

• The Mutable Site changes frequently. Each new update in-cludes a different site that highlights genetics resources avail-able on the World Wide Web. Select the Mutable Site forChapter 3 and you will be linked automatically.

• The Pic Site showcases some of the most visually appeal-ing genetics sites on the World Wide Web. To visit the geneticsWeb site pictured below, select the PIC Site for Chapter 3.

GeNETics on the Web will introduce you to some of the mostimportant sites for finding genetic information on the Inter-net. To explore these sites, visit the Jones and Bartlett homepage at

http://www.jbpub.com/genetics

For the book Genetics: Analysis of Genes and Genomes, choosethe link that says Enter GeNETics on the Web. You will be pre-sented with a chapter-by -chapter list of highlighted keywords.Select any highlighted keyword and you will be linked to a Website containing genetic information related to the keyword.

• Mendel’s paper is one of the few nineteenth century scien-tific papers that reads almost as clearly as if it had been writtentoday. It is important reading for every aspiring geneticist.You can access a conveniently annotated text using the key-word Mendel. Although modern geneticists make a clear dis-tinction between genotype and phenotype, Mendel made noclear distinction between these concepts. At this keyword siteyou will find a treasure trove of information about Mendel, in-cluding his famous paper, essays, commentary, and a collec-tion of images—all richly linked to additional Internetresources.

• Huntington disease is a devastating degeneration of thebrain that begins in middle life. It affects about 30,000 Ameri-cans, and, because of its dominant mode of inheritance andcomplete penetrance, each of their 150,000 siblings and chil-dren has a 50–50 chance of developing the disease. Namedafter George Huntington, a Long Island physician who firstdescribed it in 1872, the disease’s principal symptom is an in-voluntary, jerky motion of the head, trunk, and limbs calledchorea, after the Greek word for “dance.” At this keyword site

Page 129: Introduction to Molecular Genetics and Genomics

128 Chapter 3 Transmission Genetics: The Principle of Segregation

that the phenotypically normal son (II-3) is heterozygousfor the gene?

3.14 Huntington disease is a rare neurodegenerative hu-man disease determined by a dominant allele, HD. Thedisorder is usually manifested after the age of forty-five.A young man has learned that his father has developedthe disease.(a) What is the probability that the young man will later

develop the disorder?(b) What is the probability that a child of the young man

carries the HD allele?

3.15 Assume that the trait in the accompanying pedigreeis due to simple Mendelian inheritance.(a) Is it likely to be due to a dominant allele or a recessive

allele? Explain.(b) What is the meaning of the double horizontal line

connecting III-1 with III-2?(c) What is the biological relationship between III-1 and

III-2?(d) If the allele responsible for the condition is rare,

what are the most likely genotypes of all of the per-sons in the pedigree in generations I, II, and III? (UseA and a for the dominant and recessive alleles, re-spectively.)

3.16 The Hopi, Zuni, and some other Southwest AmericanIndians have a relatively high frequency of albinism (ab-sence of skin pigment) resulting from homozygosity for arecessive allele, a. A normally pigmented man andwoman, each of whom has an albino parent, have twochildren. What is the probability that both children are al-bino? What is the probability that at least one of the chil-dren is albino?

3.17 Say the trait in the accompanying pedigree is due tosimple Mendelian inheritance.(a) Is it likely to be due to a dominant allele or a recessive

allele? Explain.(b) What are the most likely genotypes of all of the per-

sons in the pedigree? (Use A and a for the dominantand recessive alleles.)

I

II

III

IV

1 2

1 2 3 4

1 2

21 3 4

I

1 32II

1 2

3.18 The accompanying pedigree and gel diagram showthe phenotypes of the parents for an RFLP that has multi-ple alleles. What are the possible phenotypes of the prog-eny, and in what proportions are they expected?

3.19 Red kernel color in wheat results from the presenceof at least one dominant allele of each of two inde-pendently segregating genes (in other words, R� B�genotypes have red kernels). Kernels on rr bb plants arewhite, and the genotypes R� bb and rr B� result inbrown kernel color. Suppose that plants of a variety thatis true breeding for red kernels are crossed with plantstrue breeding for white kernels.(a) What is the expected phenotype of the F1 plants?(b) What are the expected phenotypic classes in the F2

progeny and their relative proportions?

3.20 Heterozygous Cp cp chickens express a conditioncalled creeper, in which the leg and wing bones areshorter than normal (cp cp). The dominant Cp allele islethal when homozygous. Two alleles of an indepen-dently segregating gene determine white (W�) versusyellow (ww) skin color. From matings between chickensheterozygous for both of these genes, what phenotypicclasses will be represented among the viable progeny, andwhat are their expected relative frequencies?

3.21 White Leghorn chickens are homozygous for adominant allele, C, of a gene responsible for coloredfeathers, and also for a dominant allele, I, of an inde-pendently segregating gene that prevents the expressionof C. The White Wyandotte breed is homozygous reces-sive for both genes cc ii. What proportion of the F2 prog-eny obtained from mating White Leghorn � WhiteWyandotte F1 hybrids would be expected to have coloredfeathers?

12 kb9 kb

6 kb

3 kb

1 kb

I

1 2 3 4 5 6 7 8 9 10 11 12 13

1 2 3 54 10 11 12 136 7 8 9

II

III

1 2

Page 130: Introduction to Molecular Genetics and Genomics

Analysis and Applications 129

3.22 The F2 progeny from a particular cross exhibit amodified dihybrid ratio of 9 : 7 (instead of 9 : 3 : 3 : 1).What phenotypic ratio would be expected from a test-cross of the F1?

3.23 Black hair in rabbits is determined by a dominant al-lele, B, and white hair by homozygosity for a recessive al-lele, b. Two heterozygotes mate and produce a litter ofthree offpring.(a) What is the probability that the offspring are born

in the order white-black-white? What is theprobability that the offspring are born in either theorder white-black-white or the order black-white-black?

(b) What is the probability that exactly two of the threeoffspring will be white?

3.24 Consider sibships consisting of 6 children, and as-sume a sex ratio of 1 : 1.(a) What is the proportion with no girls?(b) What is the proportion with exactly 1 girl?(c) What is the proportion with exactly 2 girls?(d) What is the proportion with exactly 3 girls?(e) What is the proportion with 3 or more boys?

3.25 Andalusian fowls are colored black, splashed white(resulting from an uneven sprinkling of black pigmentthrough the feathers), or slate blue. Black and splashedwhite are true breeding, and slate blue is a hybrid thatsegregates in the ratio 1 black : 2 slate blue : 1 splashedwhite. If a pair of blue Andalusians is mated and the henlays three eggs, what is the probability that the chickshatched from these eggs will be one black, one blue, andone splashed white?

3.26 In the mating Aa � Aa, what is the smallest numberof offspring, n, for which the probability of at least one aaoffspring exceeds 95 percent?

3.27 From the F2 generation of a cross between mousegenotypes AA � aa, one male progeny of genotype A�was chosen and mated with an aa female. All of the prog-eny in the resulting litter were A�. From this result youwould like to conclude that the sire’s genotype is AA.How much confidence could you have in this conclusionfor each litter size from 1 to 15? (In other words what isthe probability that the sire’s genotype is AA, given thatthe a priori probability is 1�3 and that a litter of n pups re-sulted in all A� progeny?)

3.28 The accompanying gel diagram includes the pheno-type of two parents (A and B) with respect to two differ-ent RAPD polymorphisms. Each parent is homozygousfor an allele associated with a band defining a RAPD poly-morphism. The two RAPD polymorphisms are at differentloci and undergo independent assortment. In the gel dia-gram, indicate the expected phenotype of the F1 progenyas well as all possible phenotypes of the F2 progeny, alongwith their expected proportions.

3.29 The gel diagram shown below shows the phenotypeof two parents (A and B), each homozygous for twoRFLPs that undergo independent assortment. Parent Ahas genotype A1A1 B1B1, where the A1 allele yields a bandof 2 kb and the B1 allele yields a band of 8 kb. Parent Bhas genotype A2A2 B2B2, where the A2 allele yields a bandof 3 kb and the B2 allele yields a band of 10 kb. Show theexpected phenotype of the F1 progeny as well as all possi-ble phenotypes of the F2 progeny, along with their ex-pected proportions.

3.30 Complementation tests of the recessive mutant genes a through f produced the data in the accompanyingmatrix. The circles represent missing data. Assuming that all of the missing mutant combinations would yield data consistent with the entries that are known,complete the table by filling each circle with a � or � as needed.

a

a

b

b

c

c

d

d

e

e

f

f

+ +

+

_

_

_

A

P1

B

F1 F2 progeny

12 kb9 kb

6 kb

3 kb

1 kb

A

Parents

B

F1 F2 progeny

12 kb9 kb

6 kb

3 kb

1 kb

Page 131: Introduction to Molecular Genetics and Genomics

130 Chapter 3 Transmission Genetics: The Principle of Segregation

Challenge Problems3.31 Diagrammed here is DNA from a wildtype gene(top) and a mutant allele (bottom) that has an insertionof a transposable element that inactivates the gene. Thetransposable element is present in many copies scatteredthroughout the genome. The symbols B and E representthe positions of restriction sites for BamHI and EcoRI, re-spectively, and the rectangles show sites of hybridizationwith each of three probes (A, B, and C) that are available.The dots at the left indicate that the nearest site of eitherBamHI or EcoRI cleavage is very far to the left of the re-gion shown. Explain which probe and which single re-striction enzyme you would use for RFLP analysis toidentify both alleles. Also explain why any other choiceswould be unsuitable.

3.32 Meiotic drive is an unusual phenomenon in whichtwo alleles do not show Mendelian segregation from theheterozygous genotype. Examples are known from mam-mals, insects, fungi, and other organisms. The usualmechanism is one in which both types of gametes areformed, but one of them fails to function normally. Theexcess of the driving allele over the other can range froma small amount to nearly 100 percent. Suppose that D isan allele showing meiotic drive against its alternative al-lele d, and suppose that Dd heterozygotes produce func-tional D-bearing and d-bearing gametes in theproportions 3�4 : 1�4. In the mating Dd � Dd,

(a) What are the expected proportions of DD, Dd, and ddgenotypes?

(b) If D is dominant, what are the expected proportionsof D� and dd phenotypes?

(c) Among the D� phenotypes, what is the ratio ofDD : Dd?

(d) Answer parts (a) through (c), assuming that the mei-otic drive takes place in only one sex.

B

2A B

A B C

0

E

4

E B

6 kbTransposableelement

B

20

E

4 8 10

B EB

E

6 12 kb

3.33 The accompanying table summarizes the effect of in-herited tissue antigens on the acceptance or rejection oftransplanted tissues, such as skin grafts, in mammals. Thetissue antigens are determined in a codominant fashion, sothat tissue taken from a donor of genotype Aa carries boththe A and the a antigen. In the table, the � sign means thata graft of donor tissue is accepted by the recipient, andthe � sign means that a graft of donor tissue is rejected bythe recipient. The rule is that any graft will be rejected when-ever the donor tissue contains an antigen not present in the recipi-ent. In other words, any transplant will be accepted if, andonly if, the donor tissue does not contain an antigen differ-ent from any already present in the recipient.

The diagram illustrated below shows all possible skingrafts between inbred (homozygous) strains of mice (P1and P2) and their F1 and F2 progeny. Assume that the in-bred lines P1 and P2 differ in only one tissue-compatibilitygene. For each of the arrows, what is the probability ofacceptance of a graft in which the donor is an animal cho-sen at random from the population shown at the base ofthe arrow and the recipient is an animal chosen at ran-dom from the population indicated by the arrowhead?

a

b

c de

i jg h

k l

f

P1

F1

P2

F2

+

+

– –

– –

+ +

+

AA

AA Aa aa

Aa

aa

Donor

Recipient

Further ReadingBowler, P. J. 1989. The Mendelian Revolution. Baltimore,

MD: Johns Hopkins University Press.Carlson, E. A. 1987. The Gene: A Critical History. 2d ed.

Philadelphia: Saunders.Dunn, L. C. 1965. A Short History of Genetics. New York:

McGraw-Hill.

Hartl, D. L., and V. Orel. 1992. What did Gregor Mendelthink he discovered? Genetics 131: 245.

Hawley, R. S., and C. A. Mori. 1999. The Human Genome:A User’s Guide. San Diego: Academic.

Henig, R. M. 2000. The Monk in the Garden. Boston, MA:Houghton Mifflin.

Page 132: Introduction to Molecular Genetics and Genomics

Further Reading 131

Judson, H. F. 1996. The Eighth Day of Creation: The Makersof the Revolution in Biology. Cold Spring Harbor, NY:Cold Spring Harbor Laboratory Press.

Lewis, R. 2001. Human Genetics: Concepts and Applications.4th ed. Dubuque, IA: McGraw-Hill.

Lewontin, R. C. 2000. It Ain’t Necessarily So: The Dream ofthe Human Genome and Other Illusions. New York: NewYork Review of Books.

Maroni, G. 2000. Molecular and Genetic Analysis of HumanTraits. Malden, MA: Blackwell.

Mendel, G. 1866. Experiments in plant hybridization.(Translation.) In The Origins of Genetics: A Mendel SourceBook, ed. C. Stern and E. Sherwood. 1966. New York:Freeman.

Olby, R. C. 1966. Origins of Mendelism. London:Constable.

Orel, V. 1996. Gregor Mendel: The First Geneticist. Oxford,England: Oxford University Press.

Orel, V., and D. L. Hartl. 1994. Controversies in theinterpretation of Mendel’s discovery. History andPhilosophy of the Life Sciences 16: 423.

Sandler, I. 2000. Development: Mendel’s legacy togenetics. Genetics 154: 7.

Stern, C., and E. Sherwood, eds. 1966. The Origins ofGenetics: A Mendel Source Book. New York: Freeman.

Sturtevant, A. H. 1965. A Short History of Genetics. NewYork: Harper & Row.

Sykes, B, ed. 1999. The Human Inheritance. New York:Oxford University Press.