A Zero-Knowledge Based Introduction to Biology · 2013. 1. 10. · Translation GUG"Val GCG"Ala...
Transcript of A Zero-Knowledge Based Introduction to Biology · 2013. 1. 10. · Translation GUG"Val GCG"Ala...
A Zero-Knowledge Based Introduction to Biology
Jim Notwell
09 January 2013
Q: What is your genome?
A:
Q: What is your genome?
A:The sum of your hereditary information.
From DNA to Organism
You are composed of ~ 10 trillion cells
From DNA to Organism Cell
From DNA to Organism Cell Protein
Proteins do most of the work in biology
Central Dogma of Biology
DNA: “Blueprints” for a cell
•Genetic information encoded in long strings
•Deoxyribonucleic acid comes in four flavors: adenine, thymine, guanine, and cytosine
Phosphate-deoxyribose Backbone
O O
C C
CC
H
H
HHH
H
H
COP
O-
O
to next nucleotide
to previous nucleotide
to base
3’
5’
Nucleobase Complementary Pairing
Adenine (A)
Cytosine (C)
Guanine (G)
Thymine (T)
pyrimidines
purines
DNA Double Helix
DNA Packaging
Q: What is your genome?
A:The sum of your hereditary information.
Q: What is your genome?
A:The sum of your hereditary information. Humans bundle two copies of the genome into 46 chromosomes in every cell
Central Dogma of Biology
DNA vs RNA
RNA Nucleobases
Adenine (A)
Cytosine (C)
Guanine (G)
Uracil (U)
pyrimidines
purines
Gene Transcription
3’5’
5’3’
G A T T A C A . . .
C T A A T G T . . .
Gene Transcription
3’5’
5’3’
G A T T A C A . . .
C T A A T G T . . .
Gene Transcription
3’5’
5’3’
G A T T A C A . . .
C T A A T G T . . .
Strands are separated (DNA helicase)
Gene Transcription
3’5’
5’3’
G A T T A C
A . . .
C T A A T G T . . .
G A U U A C A
An RNA copy of the 5’→3’ sequence is created from the 3’→5’ template
Gene Transcription
3’5’
5’3’
G A U U A C A . . .
G A T T A C A . . .
C T A A T G T . . .
pre-mRNA 5’ 3’
RNA Processing
5’ cap poly(A) tail
intronexon
mRNA
5’ UTR 3’ UTR
Gene Structure
5’ 3’
promoter
5’ UTR exons 3’ UTR
introns
coding
non-coding
Central Dogma of Biology
From RNA to Protein•Proteins are long strings of amino acids joined by peptide bonds
•Translation from RNA sequence to amino acid sequence performed by ribosomes
•20 amino acids → 3 RNA letters required to specify a single amino acid
Amino AcidAlanine
Arginine
Asparagine
Aspartate
Cysteine
Glutamate
Glutamine
Glycine
Histidine
Isoleucine
Leucine
Lysine
Methionine
Phenylalanine
Proline
Serine
Threonine
Tryptophan
Tyrosine
Valine
C
O
N
H
C
H
H OH
R
There are 20 standard amino acids
Proteins
C
O
N
H
C
H
R
to previous aa to next aa
N-terminus
(start)
H OH
C-terminus
(end)from 5’ 3’ mRNA
Translation
The ribosome (a complex of protein and RNA) synthesizes a protein by reading the mRNA in triplets (codons). Each codon is translated to an amino acid.
Translation
GGGG"GlyGAG"GluGCG"AlaGUG"Val
A#GGA"GlyGAA"Glutamic#acid"(Glu)GCA"AlaGUA"Val
C#GGC"GlyGAC"AspGCC"AlaGUC"Val
U#GGU"Glycine"(Gly)GAU"Aspar4c#acid"(Asp)GCU"Alanine"(Ala)GUU"Valine"(Val)
G#
GAGG"Arg"AAG"LysACG"ThrAUG"Methionine"(Met)"or"START
A#AGA"Arginine"(Arg)AAA"Lysine"(Lys)ACA"Thr"AUA"Ile
C#AGC"Ser"AAC"AsnACC"ThrAUC"Ile
U#AGU"Serine"(Ser)AAU"Asparagine"(Asn)ACU"Threonine"(Thr)AUU"Isoleucine"(Ile)
A#
GCGG"Arg"CAG"GlnCCG"ProCUG"Leu
A#CGA"Arg"CAA"Glutamine"(Gln)CCA"ProCUA"Leu
C#CGC"Arg"CAC"HisCCC"ProCUC"Leu
U#CGU"Arginine"(Arg)CAU"His4dine"(His)CCU"Proline"(Pro)CUU"Leucine"(Leu)
C#
GUGG"Tryptophan"(Trp)UAG"STOPUCG"Ser"UUG"Leu
A#UGA"STOPUAA"STOPUCA"Ser"UUA"Leucine"(Leu)
C#UGC"CysUAC"TyrUCC"SerUUC"Phe
U#UGU"Cysteine"(Cys)UAU"Tyrosine"(Tyr)UCU"Serine"(Ser)UUU"Phenylalanine"(Phe)
U#
GACU
Translation
5’ . . . A U U A U G G C C U G G A C U U G A . . . 3’
Translation
5’ . . . A U U A U G G C C U G G A C U U G A . . . 3’
UTR Met
Start Codon
Ala Trp Thr
Stop Codon
Translation
Central Dogma of Biology
Protein coding
1%
Other Stuff 99%
Protein coding
1%
Non-coding exons
2%
Introns/ promoters/ polyA sites
37%
Intergenic transcribed
RNA 19% Regulatory
elements 9%
??? 32%
The ENCODE Project Consortium (2012) Nature 489:57-74
Non-coding RNAs•RNAs transcribed from DNA but not translated into protein
•Structural ncRNAs: Conserved secondary structure
•Involved in gene regulation
microRNA
Protein coding
1%
Non-coding exons
2%
Introns/ promoters/ polyA sites
37%
Intergenic transcribed
RNA 19% Regulatory
elements 9%
??? 32%
The ENCODE Project Consortium (2012) Nature 489:57-74
Different Cell Types
Subsets of the DNA sequence determine the identity and function of different cells
Gene Expression Regulation•When should each gene be expressed?
•Why? Every cell has same DNA but each cell expresses different proteins.
•Signal transduction: One signal converted to another: cascade has “master regulators” turning on many proteins, which in turn each turn on many proteins
Central Dogma of Biology
Transcription Regulation•Transcription factors link to binding sites
•Complex of transcription factors forms
•Complex assists or inhibits formation of the RNA polymerase machinery
Gene Transcription
3’5’
5’3’
G A T T A C A . . .
C T A A T G T . . .
Transcription Factor Binding Sites•Short, degenerate DNA sequences recognized by particular transcription factors
•For complex organisms, cooperative binding of multiple transcription factors required to initiate transcription
Binding Sequence Logo
Transcription Regulation
Transcription Factor A
TF A Binding Site
Gene B
Gene Regulatory RegionANRV285-GG07-02 ARI 8 August 2006 1:29
understand how different permutations of thesame regulatory elements alter gene expres-sion. An understanding of how the combina-torial organization of a promoter encodes reg-ulatory information first requires an overviewof the proteins that constitute the transcrip-tional machinery.
THE EUKARYOTICTRANSCRIPTIONALMACHINERYFactors involved in the accurate transcrip-tion of eukaryotic protein-coding genes byRNA polymerase II can be classified into threegroups: general (or basic) transcription fac-tors (GTFs), promoter-specific activator pro-teins (activators), and coactivators (Figure 2).GTFs are necessary and can be sufficient foraccurate transcription initiation in vitro (re-viewed in 141). Such factors include RNApolymerase II itself and a variety of auxil-iary components, including TFIIA, TFIIB,TFIID, TFIIE, TFIIF, and TFIIH. In addi-tion to these “classic” GTFs, it is apparent thatin vivo transcription also requires Mediator,a highly conserved, large multisubunit com-plex that was originally identified in yeast (re-viewed in 38, 119).
GTFs assemble on the core promoter inan ordered fashion to form a transcriptionpreinitiation complex (PIC), which directsRNA polymerase II to the transcription startsite (TSS). The first step in PIC assemblyis binding of TFIID, a multisubunit com-plex consisting of TATA-box-binding pro-tein (TBP) and a set of tightly bound TBP-associated factors (TAFs). Transcription thenproceeds through a series of steps, includingpromoter melting, clearance, and escape, be-fore a fully functional RNA polymerase IIelongation complex is formed. The currentmodel of transcription regulation views thisas a cycle, in which complete PIC assembly isstimulated only once. After RNA polymeraseII escapes from the promoter, a scaffold struc-ture, composed of TFIID, TFIIE, TFIIH,and Mediator, remains on the core promoter
Distal regulatory elements
Proximalpromoterelements
Promoter ( 1 kb)
Corepromoter
EnhancerSilencer
Locus controlregion Insulator
Figure 1Schematic of a typical gene regulatory region. The promoter, which iscomposed of a core promoter and proximal promoter elements, typicallyspans less than 1 kb pairs. Distal (upstream) regulatory elements, which caninclude enhancers, silencers, insulators, and locus control regions, can belocated up to 1 Mb pairs from the promoter. These distal elements maycontact the core promoter or proximal promoter through a mechanism thatinvolves looping out the intervening DNA.
Generaltranscription factor(GTF): a factor thatassembles on thecore promoter toform a preinitiationcomplex and isrequired fortranscription of all(or almost all) genes
Coactivators:adaptor proteins thattypically lackintrinsicsequence-specificDNA binding butprovide a linkbetween activatorsand the generaltranscriptionalmachinery
PIC: preinitiationcomplex
TSS: transcriptionstart site
(73); subsequent reinitiation of transcriptionthen only requires rerecruitment of RNApolymerase II-TFIIF and TFIIB.
The assembly of a PIC on the core pro-moter is sufficient to direct only low levels ofaccurately initiated transcription from DNAtemplates in vitro, a process generally referredto as basal transcription. Transcriptional ac-tivity is greatly stimulated by a second classof factors, termed activators. In general, ac-tivators are sequence-specific DNA-bindingproteins whose recognition sites are usuallypresent in sequences upstream of the corepromoter (reviewed in 149). Many classes ofactivators, discriminated by different DNA-binding domains, have been described, eachassociating with their own class of specificDNA sequences. Examples of activator fam-ilies include those containing a cysteine-rich zinc finger, homeobox, helix-loop-helix(HLH), basic leucine zipper (bZIP), fork-head, ETS, or Pit-Oct-Unc (POU) DNA-binding domain (reviewed in 142). In additionto a sequence-specific DNA-binding domain,a typical activator also contains a separableactivation domain that is required for the ac-tivator to stimulate transcription (149). An
www.annualreviews.org • Transcriptional Regulatory Elements 31
Ann
u. R
ev. G
enom
. Hum
an G
enet
. 200
6.7:
29-5
9. D
ownl
oade
d fr
om a
rjour
nals
.ann
ualre
view
s.org
by S
tanf
ord
Uni
vers
ity R
ober
t Cro
wn
Law
Lib
. on
04/0
3/07
. For
per
sona
l use
onl
y.
Gene Regulatory Region
ANRV285-GG07-02 ARI 8 August 2006 1:29
TBP:TATA-box-bindingprotein
TAF:TBP-associatedfactor
TFBS: transcriptionfactor-binding site
PIC
TFIIDTFIIA
TFIIB
TFIIF
TFIIH
RNApolymerase II
TFIIE
?
?
?
Activator
Mediator
DBD
AD
Corepromoter
TATA TSS
Co-activator
Figure 2The eukaryotic transcriptional machinery. Factors involved in eukaryotictranscription by RNA polymerase II can be classified into three groups:general transcription factors (GTFs), activators, and coactivators. GTFs,which include RNA polymerase II itself and TFIIA, TFIIB, TFIID,TFIIE, TFIIF, and TFIIH, assemble on the core promoter in an orderedfashion to form a preinitiation complex (PIC), which directs RNApolymerase II to the transcription start site (TSS). Transcriptional activityis greatly stimulated by activators, which bind to upstream regulatoryelements and work, at least in part, by stimulating PIC formation througha mechanism thought to involve direct interactions with one or morecomponents of the transcriptional machinery. Activators consist of aDNA-binding domain (DBD) and a separable activation domain (AD)that is required for the activator to stimulate transcription. The directtargets of activators are largely unknown.
extensive discussion of the properties of acti-vators is beyond the scope of this review; read-ers are referred to several excellent reviews onthe subject (87 and references therein).
The DNA-binding sites for activators[also called transcription factor-binding sites(TFBSs)] are generally small, in the rangeof 6–12 bp, although binding specificity isusually dictated by no more than 4–6 po-sitions within the site. The TFBSs for a
specific activator are typically degenerate,and are therefore described by a consen-sus sequence in which certain positions arerelatively constrained and others are morevariable. Many activators form heterodimersand/or homodimers, and thus their bindingsites are generally composed of two half-sites.Notably, the precise subunit composition ofan activator can also dictate its binding speci-ficity and regulatory action (37).
Although an activator can bind to a widevariety of sequence variants that conform tothe consensus, in certain instances the precisesequence of a TFBS can impact the regulatoryoutput. For example, TFBS sequence vari-ations can affect activator binding strength(reviewed in 30), which may be biologicallyimportant in situations such as in early devel-opment, in which activators are distributed ina concentration gradient (84, 144). TFBS se-quence variations may also direct a preferencefor certain dimerization partners over others(37, 124, 142). Finally, the particular sequenceof a TFBS can affect the structure of a boundactivator in a way that alters its activity (69,104, 108, 154, 163). The best-studied exam-ples are nuclear hormone receptors, a largeclass of ligand-dependent activators. Variousstudies have shown that the relative orienta-tion of the half-sites, as well as the spacing be-tween them, play a major role in directing theregulatory action of the bound nuclear hor-mone receptor dimer (37).
Activators work, at least in part, by in-creasing PIC formation through a mechanismthought to involve direct interactions withone or more components of the transcrip-tional machinery, termed the “target” (141,149). Activators may also act by promoting astep in the transcription process subsequent toPIC assembly, such as initiation, elongation,or reinitiation (103). Finally, activators havealso been proposed to function by recruit-ing activities that modify chromatin structure(47, 106). Chromatin often poses a barrierto transcription because it prevents the tran-scriptional machinery from interacting di-rectly with promoter DNA, and thus can be
32 Maston · Evans · Green
Ann
u. R
ev. G
enom
. Hum
an G
enet
. 200
6.7:
29-5
9. D
ownl
oade
d fr
om a
rjour
nals
.ann
ualre
view
s.org
by S
tanf
ord
Uni
vers
ity R
ober
t Cro
wn
Law
Lib
. on
04/0
3/07
. For
per
sona
l use
onl
y.
Protein coding
1%
Non-coding exons
2%
Introns/ promoters/ polyA sites
37%
Intergenic transcribed
RNA 19% Regulatory
elements 9%
??? 32%
The ENCODE Project Consortium (2012) Nature 489:57-74
Q: What if the transcription/translation machinery makes mistakes?
Q:What is the effect in coding regions?
Evolution = Mutation + Selection
Structural AbnormalitiesB. Structural Abnormalities Normal
Insertion
Reciprocal Translocation
Duplication
Deletion
Inversion
Single Nucleotide ChangesII. Single Nucleotide Changes
A A A A T A C G T G C A U U U U A U G C A C G U
Phe Tyr Ala Arg
DNA
mRNA
Protein
Normal
A A G A T A C G T G C A U U C U A U G C A C G U
Phe Tyr Ala Arg
DNA
mRNA
Protein
Silent Mutation
A A A A T A C C T G C A U U U U A U G G A C G U
Phe Tyr Gly Arg
DNA
mRNA
Protein
Missense Mutation
A A A A T T C G T G C A U U U U A A G C A C G U
Phe
DNA
mRNA
Protein
Nonsense Mutation
STOP
Single Nucleotide ChangesII. Single Nucleotide Changes
A A A A T A C G T G C A U U U U A U G C A C G U
Phe Tyr Ala Arg
DNA
mRNA
Protein
Normal
A A A T A T A C G T G C U U U A U A U G C A C G
Phe Ile Cys Thr
DNA
mRNA
Protein
Frameshift (Insertion) A A A A A C C T G C A U U U U U G G A C G U
Phe Leu His Val
DNA
mRNA
Protein
Frameshift (Deletion)
T
Evolution = Mutation + Selection
Selection
time
Harmful mutation Beneficial mutation
Evolution = Mutation + Selection
Summary
Evolution = Mutation + Selection
Summary•All hereditary information encoded in double-stranded DNA
•Each cell in an organism has same DNA
•DNA → RNA → protein
•Proteins have many diverse roles in cell
•Gene regulation diversifies protein products within different cells
Further Reading•See website: cs173.stanford.edu