Lecture 3: Biology Basics Continued · 2019. 8. 23. · Lecture 3: Biology Basics Continued Fall...
Transcript of Lecture 3: Biology Basics Continued · 2019. 8. 23. · Lecture 3: Biology Basics Continued Fall...
-
Lecture3:BiologyBasicsContinued
Fall2019September3,2019
-
Genotype/PhenotypePhenotype:
Blueeyes Browneyes
Genotype:
Recessive:bb Dominant:BborBB
-
• Genesareshowninrelativeorderanddistancefromeachotherbasedonpedigreestudies.
• ThechanceofthechromosomebreakingbetweenA&CishigherthanthechanceofthechromosomebreakingbetweenA&Bduringmeiosis
• Similarly,thechanceofthechromosomebreakingbetweenE&FishigherthanthechanceofthechromosomebreakingbetweenF&G
• Theclosertwogenesare,themorelikelytheyaretobeinheritedtogether(co-occurrence)
• Ifpedigreestudiesshowahighincidenceofco-occurrence,thosegeneswillbelocatedclosetogetheronageneticmap
-
• Pleiotropy:whenonegeneaffectsmanydifferenttraits.
• Polygenictraits:whenonetraitisgovernedbymultiplegenes,whichmaybeonthesamechromosomeorondifferentchromosomes.– Theadditiveeffectsofnumerousgenesonasinglephenotypecreateacontinuumofpossibleoutcomes.
– Polygenictraitsarealsomostsusceptibletoenvironmentalinfluences.
-
Pleiotropyinhumans:PhenylketonuriaAdisorderthatiscausedbyadeficiencyoftheenzymephenylalaninehydroxylase,whichisnecessarytoconverttheessentialaminoacidphenylalaninetotyrosine.AdefectinthesinglegenethatcodesforthisenzymethereforeresultsinthemultiplephenotypesassociatedwithPKU,includingmentalretardation,eczema,andpigmentdefectsthatmakeaffectedindividualslighterskinned
-
PolygenicInheritanceinHumans• Heightiscontrolledbypolygenesforskeletonheight,buttheir
effectmaybeaffectedbymalnutrition,injury,anddisease.• Weight,skincolor,andintelligence.• Birthdefectslikeclubfoot,cleftpalate,orneuraltubedefectsare
alsotheresultofmultiplegeneinteractions.• Complexdiseasesandtraitshaveatendencytohavelow
heritability(tendencytobeinherited)comparedtosinglegenedisorders(i.e.sickle-cellanemia,cysticfibrosis,PKU,Hemophelia,manyextremelyraregeneticdisorders).
-
Selection• Somegenesmaybesubjecttoselection,whereindividualswithadvantagesor“adaptive”traitstendtobemoresuccessfulthantheirpeersreproductively.
• Whenthesetraitshaveageneticbasis,selectioncanincreasetheprevalenceofthosetraits,becausetheoffspringwillinheritthosetraits.Thismaycorrelatewiththeorganism'sabilitytosurviveinitsenvironment.
• Severaldifferentgenotypes(andpossiblyphenotypes)maythencoexistinapopulation.Inthiscase,theirgeneticdifferencesarecalledpolymorphisms.
-
GeneticMutation• Thesimplestisthepointmutationorsubstitution;here,asingle
nucleotideinthegenomeischanged(singlenucleotidepolymorphisms(SNPs))
• Othertypesofmutationsincludethefollowing:– Insertion.ApieceofDNAisinsertedintothegenomeatacertainposition
– Deletion.ApieceofDNAiscutfromthegenomeatacertainposition
– Inversion.ApieceofDNAiscut,flippedaroundandthenre-inserted,therebyconvertingitintoitscomplement
– Translocation.ApieceofDNAismovedtoadifferentposition.– Duplication.AcopyofapieceofDNAisinsertedintothegenome
-
MutationsandSelection
• Whilemutationscanbedetrimentaltotheaffectedindividual,theycanalso,inrarecases,bebeneficial;morefrequently,neutral.
• Oftenmutationshavenoornegligibleimpactonsurvivalandreproduction.
• Therebymutationscanincreasethegeneticdiversityofapopulation,thatis,thenumberofpresentpolymorphisms.
• Incombinationwithselection,thisallowaspeciestoadapttochangingenvironmentalconditionsandtosurviveinthelongterm.
-
RawSequenceData• 4bases:A,C,G,T+other(i.e.N=any,etc.)
– kb(=kbp)=kilobasepairs=1,000bp– Mb=megabasepairs=1,000,000bp– Gb=gigabasepairs=1,000,000,000bp.
• Size: – E.Coli4.6Mbp(4,600,000)– Fish130Gbp(130,000,000,000)– Parisjaponica(Plant)150Gbp– Human3.2Gbp
-
FastaFile• AsequenceinFASTAformatbeginswithasingle-line
description,followedbylinesofsequencedata(fileextensionis.fa).
• Itisrecommendedthatalllinesoftextbeshorterthan80charactersinlength.
-
FastqFile• Typicallycontain4lines:
– Line1beginswitha'@'characterandisfollowedbyasequenceidentifierandanoptionaldescription.
– Line2isthesequence.– Line3isthedelimiter‘+’,withanoptionaldescription.– Line4isthequalityscore.– fileextensionis.fq
@SEQ_IDGATTTGGGGTTCAAAGCTTCAAAGCTTCAAAGC
+!''*((((***+))%%%++++++++!!!++***
-
CentralDogma
-
DiscoveryofDNA• DNA Sequences
– Chargaff and Vischer, 1949 • DNA consisting of A, T, G, C
– Adenine, Guanine, Cytosine, Thymine – Chargaff Rule
• Noticing #A≈#T and #G≈#C – A “strange but possibly meaningless”
phenomenon. • Wow!! A Double Helix
– Watson and Crick, Nature, April 25, 1953 –
– Rich, 1973 • Structural biologist at MIT. • DNA’s structure in atomic resolution.
Crick Watson
1 Biologist 1 Physics Ph.D. Student 900 words Nobel Prize
-
Watson&Crick–“…thesecretoflife”• Watson: a zoologist, Crick: a physicist • “In 1947 Crick knew no biology and
practically no organic chemistry or crystallography..” – www.nobel.se
• Applying Chagraff’s rules and the X-ray
image from Rosalind Franklin, they constructed a “tinkertoy” model showing the double helix.
• Their 1953 Nature paper: “It has not
escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material.”
Watson & Crick with DNA model
Rosalind Franklin with X-ray image of DNA
-
Superstructure
Lodish et al. Molecular Biology of the Cell (5th ed.). W.H. Freeman & Co., 2003.
-
Superstructureimplications• DNA in a living cell is in a highly compacted and
structured state. • Transcription factors and RNA polymerase need
ACCESS to do their work. • Transcription is dependent on the structural state
– SEQUENCE alone does not tell the whole story.
-
RNA• RNA is similar to DNA chemically. It is usually only
a single strand. T(hyamine) is replaced by U(racil)
• RNA can form secondary structures by “pairing up”
http://www.cgl.ucsf.edu/home/glasfeld/tutorial/trna/trna.gif tRNA linear and 3D view:
-
RNA,continued • Several types exist, classified by function • mRNA – carries a gene’s message out of
the nucleus. • tRNA – transfers genetic information from
mRNA to an amino acid sequence • rRNA – ribosomal RNA. Part of the
ribosome machine.
-
Protein
• A polymer composed of amino acids. • There are 20 naturally occurring amino
acids.
• Usually functions through molecular motion or binding with other molecules.
-
Proteins:PrimaryStructure
• Peptidesequence:– Sequenceofaminoacids=sequencesfroma20letteralphabet(i.e.ACDEFGHIKLMNPQRSTVWY)
– Averageproteinhas~300aminoacids– Typicallystoredasfastafiles
>gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLVEWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLGLLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVILGLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGXIENY
-
NaturallyOccurringAminoAcids
-
Proteins:SecondaryStructure
• Polypeptidechainsfoldintoregularlocalstructures– Commontypes:alphahelix,betasheet,turn,loop– Definedbythecreationofhydrogenbonds
-
Proteins:TertiaryStructure
• 3Dstructureofapolypeptidesequence– interactionsbetweennon-localandforeignatoms
-
Proteins:QuaternaryStructure
• Arrangementofproteinsubunits
-
Conclusions
-
ChallengesinBioinformatics
• Needtofeelcomfortableininterdisciplinaryarea
• Dependonothersforprimarydata• Needtoaddressimportantbiologicalandcomputerscienceproblems
-
BasicStepsinBioinformaticsResearch
1. Datamanagementproblem:storage,transfer,transformation(InformationTechnology)
2. Dataanalysisproblem:mapping,assembly– algorithmscaling(ComputerScience)
3. Statisticalchallenges:traditionalstatisticsisnotwellsuitedformodelingsystematicerrorsoverlargenumberofobservations(Biostatistics)
4. Biologicalhypothesistesting– datainterpretation(LifeScience)
-
BasicSkills
• Artificialintelligenceandmachinelearning• Statisticsandprobability• Algorithms• Databases• Programming• Biology/Chemistryknowledge
-
Genomics:- Assembly- Detectionofvariation- GWAS
RNA:- Geneexpression- Transcriptomeassembly- Pathwayanalysis- RNA-RNAinteraction
Protein:- Massspectrometry- Structureprediction- Protein-Protein
interaction