Introduction to Bioinformatics Molecular Biology Primer 1.
-
Upload
violet-harrison -
Category
Documents
-
view
226 -
download
0
Transcript of Introduction to Bioinformatics Molecular Biology Primer 1.
Genetic Material
• DNA (deoxyribonucleic acid) is the genetic material
• Information stored in DNA– the basis of inheritance– distinguishes living things from nonliving
things
• Genes– various units that govern living thing’s
characteristics at the genetic level
2
Nucleotides
• Genes themselves contain their information as a specific sequence of nucleotides found in DNA molecules
• Only four different bases in DNA molecules– Guanine (G)– Adenine (A)– Thymine (T)– Cytosine (C)
• Each base is attached to a phosphate group and a deoxyribose sugar to form a nucleotide.
• The only thing that makes one nucleotide different from another is which nitrogenous base it contains
SugarP
Base
3
Nucleotides
• Complicated genes can be many thousands of nucleotides long
• All of an organism’s genetic instructions, its genome, can be maintained in millions or even billions of nucleotides
5
Orientation
• Strings of nucleotides can be attached to each other to make long polynucleotide chains
• 5’ (5 prime) end – The end of a string of nucleotides with a 5'
carbon not attached to another nucleotide
• 3’ (3 prime) end– The other end of the molecule with an
unattached 3' carbon
6
Base Pairing
• Structure of DNA– Double helix– Seminal paper by Watson and Crick in 1953– Rosalind Franklin’s contribution
• Information content on one of those strands essentially redundant with the information on the other– Not exactly the same—it is complementary
• Base pair– G paired with C (G C)– A paired with T (A = T)
8
Base Pairing
• Reverse complements– 5' end of one strand corresponding to the 3' end of its
complementary strand and vice versa
• Example– one strand: 5'-GTATCC-3'
the other strand: 3'-CATAGG-5' 5'-GGATAC-3'
• Upstream: Sequence features that are 5' to a particular reference point
• Downstream: Sequence features that are 3' to a particular reference point
5' 3'Upstream Downstream
10
Chromosome
• Different kinds of organisms have different numbers of chromosomes
• Humans – 23 pairs– 46 in all
15
Central Dogma of Molecular Biology
• DNA: information storage
• Protein: function unit, such as enzyme
• Gene: instructions needed to make protein
• Central dogma
16
Central Dogma of Molecular Biology
• Central dogma
reverse transcription(reverse transcriptase)
replication(DNA polymerase)
• DNA obtained from reverse transcription is called complementary DNA (cDNA) Difference between DNA and cDNA will be
discussed later 17
Central Dogma of Molecular Biology
• RNA (ribonucleic acid)– Single-stranded polynucleotide– Bases
• A• G• C• U (uracil), instead of T
• Transcription (simplified …)– A A, G G, C C, T U
SugarP
Base
SugarP
Base
H
OH
DNA
RNA
18
DNA Replication Animation
Courtesy of Rob Rutherford, St. Olaf University
23
Transcription (DNA RNA)
• Messenger RNA (mRNA)– carries information to be
translated
• Ribosomal RNA (rRNA)– the working “spine” of
the ribosome
• Transfer RNA (tRNA)– the “decoder keys” that
will translate nucleic acids to amino acids
24
Transcription Animation
Courtesy of Rob Rutherford, St. Olaf University
25
Peptides and Proteins
• mRNA Sequence of amino acids connected by peptide bond
• Amino acid sequence– Peptide: < 30 – 50 amino acids– Protein: longer peptide
26
List of Amino Acids
Amino acid Symbol CodonA Alanine Ala GC*C Cysteine Cys UGU, UGCD Aspartic Acid Asp GAU, GACE Glutamic Acid Glu GAA, GAGF Phenylalanine Phe UUU, UUCG Glycine Gly GG*H Histidine His CAU, CACI Isoleucine Ile AUU, AUC, AUAK Lysine Lys AAA, AAGL Leucine Leu UUA, UUG, CU*
30
List of Amino Acids
Amino acid Symbol CodonM Methionine Met AUGN Asparagine Asn AAU, AACP Proline Pro CC*Q Glutamine Gln CAA, CAGR Arginine Arg CG*, AGA, AGGS Serine Ser UC*, AGU, AGCT Threonine Thr AC*V Valine Val GU*W Tryptophan Trp UGGY Tyrosine Tyr UAU, UAC
20 letters, no B J O U X Z31
Codon and Reading Frame
• 4 AA letters 43 = 64 triplet possibilities• 20 (< 64) known amino acids• Wobbling 3rd base• Redundant Resistant to mutation• Reading frame: linear sequence of codons in a
gene• Open Reading Frame (ORF), definition varies:
– a reading frame that begins with a start codon and end at a stop codon
– a series of codons in a DNA sequence uninterrupted by the presence of a stop codon
a potential protein-coding region of DNA sequence32
Open Reading Frame
• Given a nucleotide sequence– How many reading frames? __
• __ forward and __ backward
• Example: Given a DNA sequence, 5’-ATGACCGTGGGCTCTTAA-3’– ATG ACC GTG GGC TCT TAA M T V G S *– TGA CCG TGG GCT CTT AA * P W A L – GAC CGT GGG CTC TTA A D R G L L– Figure out the three backward reading frames
• In random sequence, a stop codon will follow a Met in ~20 AAs
• Substantially longer ORFs are often genes or parts of them
33
Translation Animation
Courtesy of Rob Rutherford, St. Olaf University
35
Gene Expression
• Gene expression– Process of using the information stored in
DNA to make an RNA molecule and then a corresponding protein
• Cells controlling gene expression by– reliably distinguishing between those parts of
an organism’s genome that correspond to the beginnings of genes and those that do not
– determining which genes code for proteins that are needed at any particular time.
36
Promoter
• The probability (P) that a string of nucleotides will occur by chance alone if all nucleotides are present at the same frequency P = (1/4)n, where n is the string’s length
• Promoter sequences – Sequences recognized by RNA polymerases as being associated
with a gene
• Example– Prokaryotic RNA polymerases scan along DNA looking for a
specific set of approximately 13 nucleotides marking the beginning of genes
– 1 nucleotide that serves as a transcriptional start site – 6 that are 10 nucleotides 5' to the start site, and – 6 more that are 35 nucleotides 5' to the start site– What is the frequency for the sequence to occur?
37
Gene Regulation
• Regulatory proteins– Capable of binding to a cell’s DNA near the promoter
of the genes – Control gene expression in some circumstances but
not in others
• Positive regulation – binding of regulatory proteins makes it easier for an
RNA polymerase to initiate transcription
• Negative regulation– binding of the regulatory proteins prevents
transcription from occurring
38
Promoter and Regulatory Example
• Low tryptophan concentration RNA polymerase binds to promoter genes transcribed
• High tryptophan concentration repressor protein becomes active and binds to operator blocks the binding of RNA polymerase to the promoter
• Tryptophan concentration drops repressor releases its tryptophan and is released from DNA polymerase again transcribes genes 39
Point Mutation Example: Sickle-cell Disease
• Wild-type hemoglobin
DNA
3’----CTT----5’
mRNA
5’----GAA----3’
Normal hemoglobin
------[Glu]------
• Mutant hemoglobin
DNA
3’----CAT----5’
mRNA
5’----GUA----3’
Mutant hemoglobin
------[Val]------50
50% is high copy number repeats
About 10% is transcribed
(made into RNA)
Only 1.5% actually codes for protein
98.5% Junk DNA
Thinking about the Human Genome
52