Introduc)on to Bioinformacs -...
-
Upload
vuongduong -
Category
Documents
-
view
214 -
download
0
Transcript of Introduc)on to Bioinformacs -...
![Page 1: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/1.jpg)
Introduc)ontoBioinforma)cs
ShifraBen-Dor
Bioinforma)csUnitLifeSciencesCoreFacili)es
![Page 2: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/2.jpg)
LectureOutline:
• TechnicalCourseItems
• Sequences
• Databases
– Thisweekandnextweek
![Page 3: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/3.jpg)
Thetechnicalstuff
Thecourseismadeupofonelectureandanop)onalexercisesessioneachweek.
Theexercisesessionsarenotmandatory,theyaretheretohelp.Demonstra)onsoftheprogramswillbedoneinboththelecturesandtheexercisesessions.Theexercisesessionsareanopportunityforyoutodotheassignmentwithsomebodytheretoaskforhelpifyougetstuck.
![Page 4: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/4.jpg)
Thetechnicalstuff
Ifyouareplanningoncomingtotheexercisesessions,pleasesendmeanemail:
![Page 5: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/5.jpg)
TheTechnicalStuff
Thecoursewebsiteiswhereyoucanfindthesyllabus,lecturenotes,assignments,linkstothevariousprogramstaughtandtorelevantliterature.Itisalsowhereweputannouncementsandupdates.hOp://dors.weizmann.ac.il/course/introbioinfo/
![Page 6: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/6.jpg)
• ThiscourseisbuiltforBiologists
• Backgroundwillbegivenonvarioustopicsasneeded,butbasicknowledgeofB.Sc.levelbiologyistakenforgranted
• Ifyouneedhelpwiththebiology,contactme
Thetechnicalstuff
![Page 7: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/7.jpg)
Requirementsforagrade
• Youarerequiredtodoalloftheassignmentsandafinalproject
• Thecoursegradeiscomputedasfollows:60%finalproject,40%assignments
![Page 8: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/8.jpg)
Assignments
• Youhavetwoweekstohandineachassignment
• AssignmentsaretobehandedinattheWolfsonlecturehall,bytheendofthelecture(11:00)
• Ifforanyreasonyouneed/wantanextension,talktomeBEFOREtheassignmentisdue
• Anassignmenthandedinlateornotatallwillgeta0
![Page 9: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/9.jpg)
Assignments
• Youmayconsultwithafriendwhiledoingtheassignment,howeverallworkmustbehandedinindividually.Ifwefindcopyingthegradewillbedividedamongthenumberofstudentshandinginthesameanswersheet
• Assignmentsshouldbeprintedandhandedin.Electronicsubmission(e-mail)willNOTbeaccepted.
![Page 10: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/10.jpg)
FinalProject
• ThefinalprojectwillbegiveninthebeginningofJuly.
• ItwillbedueonAugust9.
• Lateprojectswillnotbeaccepted
• ThereisNOpossibilitytocorrectprojects
• Ifevidenceisfoundofsharedwork,therewillbenocoursegrade
![Page 11: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/11.jpg)
Announcements,Updates…
• Anynewswillbeannouncedinthelecturesandupdatedonthewebsite
• Whatissaidinthelecturehallisthefinalword,unlessspecifiedotherwise
![Page 12: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/12.jpg)
Ifyouhaveques)ons,comments,sugges)onsorcomplaints-pleasecontactus-theearlierthebeOer!
![Page 13: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/13.jpg)
CourseStaff
MainLecturer:ShifraBen-DorMetargelot:
IritOrr BareketDassa
![Page 15: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/15.jpg)
Whatisbioinforma)cs?
![Page 16: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/16.jpg)
Whatwillwecoverinthiscourse?
![Page 17: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/17.jpg)
Whatwon’twecoverinthiscourse?
• Detailedstructuralanalysisofproteins• AlgorithmDevelopment
• Highthroughputmethods
• In-depthphylogene)csorevolu)onarybiology• In-depthsystemsbiology
• siRNA,miRNA
• PromoterAnalysis
![Page 18: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/18.jpg)
Skep)cismandcomputers
![Page 19: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/19.jpg)
ThebiologicalthinkinghastobedonebyYOU
![Page 20: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/20.jpg)
LectureOutline:
• TechnicalCourseItems
• Sequences
• Databases
– Thisweekandnextweek
![Page 21: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/21.jpg)
What“unitsofinforma)on”dowedealwithinbioinforma)cs?
• DNA
• RNA
• Protein
• Sequence
• Structure
• Evolu)on
• Pathways
• Interac)ons
• Muta)ons
![Page 22: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/22.jpg)
Examplesofbiologicaldatausedinbioinforma)cs
v DNA (Genome)
v RNA (Transciptome)
v Protein (Proteome)
![Page 23: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/23.jpg)
DNA RawDNASequence
• CodingorNotcoding?
• Parseintogenes?
• Otherimportant
genomicelements?
• 4bases:AGCT
atggcaaOaaaaOggtatcaatggOOggtcgtatcggccgtatcgtaOccgtgcagcacaacaccgtgatgacaOgaagOgtaggtaOaacgacOaatcgacgOgaatacatggcOatatgOgaaatatgaOcaactcacggtcgOtcgacggcactgOgaagtgaaagatggtaacOagtggOaatggtaaaactatccgtgtaactgcagaacgtgatcca
![Page 24: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/24.jpg)
DNA/RNAsequences
• Genesareencodedingenomicsequences.
• Genesaretranscribedintopre-mRNAs(includingcoding,intronic,5’and3’untranslatedregions).
• mRNAsarespliced(intronsremoved)andtranslatedintoproteins.
• mRNAsarecopiedtocDNAs(inthelab)
![Page 25: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/25.jpg)
TSS TTS
ATG Stop PolyAsitePromoter 1 2 3 4
ATG Stop PolyAsite
1 2 3 4
GenomicDNA
Pre-mRNA
mRNA
ModifiedfromZhangMQNatRevGenet.2002Sep;3(9):698-709.
ATG Stop
1 2 3 4Cap PolyA
5’UTR 3’UTRCDS
![Page 26: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/26.jpg)
SourcesofmRNAs
• Experimental– Clonenewgene– “Clone”genefromdatabase– RNA-Seq
• Database– “Typical”cDNA– FulllengthcDNA– EST(ExpressedSequenceTag)– Shortreadsequences
![Page 27: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/27.jpg)
mRNA
FulllengthcDNA
TypicalcDNA
5’mG AAAA
TTTT
TTTT
tag
AAAAtag
tag
![Page 28: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/28.jpg)
SourcesofmRNAs
• Experimental– Clonenewgene– “Clone”genefromdatabase– RNA-Seq
• Database– “Typical”cDNA– FulllengthcDNA– EST(ExpressedSequenceTag)– Shortreadsequences
![Page 29: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/29.jpg)
RNA
RNA,cDNA,andESTs
mRNA
cDNA
exon1 exon2 exon3
EST
EST
cDNAclone
GenBankESTs(ExpressedSequenceTags):~8,700,000humanESTs~4,850,000mouseESTs
AdaptedwithpermissionfromAdamSar)el
![Page 30: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/30.jpg)
UsesofESTs
- predic)onofcodingregions- detec)onofalterna)vesplicing- clusteringtoform“genes”Problemswithclustering:- incompletecoveragebreaksgenesup- genefamilies
![Page 31: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/31.jpg)
ProblemswithESTs
- lowcopynumbergenes
- rare)ssues- mistakes
- enrichmentof3’endsofgenes
- incompletecoverageofgenes
![Page 32: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/32.jpg)
NextGenera)onSequencing
• Generallyshortreads(thoughnowlongertechnologiesarebecomingavailable)
• Sequencelengthsrangefrom20-25bpto75-100to150bpreads
• Canbe3’endonly• Canbepairedorsingleread
![Page 33: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/33.jpg)
MatePair
Con)gsor“Transcripts”
FragmentRead
Pairedendreads
![Page 34: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/34.jpg)
ESTvsRead
• ESTshavelongercon)nuoussequence,sobeOertoseegenestructure(alterna)ve
splicing)
• Shortreadsgenerallyhavehigheraccuracy
• Bothcannotgiveapictureofawholegene
![Page 35: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/35.jpg)
Protein
• 20leOeralphabet ACDEFGHIKLMNPQRSTVWY ButnotBJOUXZ
• Stringsof~300aainanaverageprotein
(e.g.bacteria)
• Proteinaredividedintodomains
LNCIVAVSQNMGIGKNGDLPWPPLRNEFRYFQRMTTTSSVEGKQNLVIMGKKTWFSILNSIVAVCQNMGIGKDGNLPWPPLRNEYKYFQRMTSTSHVEGKQNAVIMGKKTWFSIISLIAALAVDRVIGMENAMPWNLPADLAWFKRNTLDKPVIMGRHTWESITAFLWAQDRNGLIGKDGHLPWHLPDDLHYFRAQTVGKIMVVGRRTYESF
![Page 36: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/36.jpg)
Protein
v ProteomeofanOrganismv 2Dgelsv MassSpecv 2DStructurev 3DStructurev 4DStructure(interac)ons)
![Page 37: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/37.jpg)
LectureOutline:
• TechnicalCourseItems
• Sequences
• Databases
![Page 38: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/38.jpg)
Databases:Outline
• Introduc)on– DataandDatabasetypes– Databasecomponents
• DataFormats• Sampledatabases• Howtotextsearchdatabases
![Page 39: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/39.jpg)
What“unitsofinforma)on”dowedealwithinbioinforma)cs?
• DNA• RNA• Protein
• Sequence• Structure• Evolu)on
• Pathways• Interac)ons• Muta)ons
![Page 40: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/40.jpg)
AAGTGCCACTGCATAAATGACCATGAGTGGGCACCGGTAAGGGAGGGTGATGCTATCTGGTCTGAAGNucleotidesequence
Genes
mRNA
Proteinprimarysequence
Protein 3Dstructure
ProteinFunction
Acts as a tumor suppressor inmany tumor types. induces growtharrest or apoptosis depending on thephysiological circumstances or celltype, but both activities areinvolved in tumor suppression.
Involved in the transport ofchloride ions. Defects in CFTRare the cause of cystic fibrosis.It is the most common genetic diseasein the caucasian population, with aprevalence of about 1 in 2000 livebirths. cf, an autosomal recessivedisorder, is a common generalizeddisorder of exocrine gland function
SNPs
![Page 41: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/41.jpg)
Whatdowewantfromdatabases?
Allofthesehavedatabasesandtoolsthatwerecreatedtoworkwiththem
![Page 42: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/42.jpg)
Informa)onretrievalfromsequencedatabases
Biologicaldatabasescontainenormousamountsofdata.
• Databasesneedtobewellannotated.• Databasesneedtobeeasilysearched.• Datafoundindatabasesshouldbeeasilyretrieved.
• Dataindatabasesshouldbeinstandardformats.
![Page 43: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/43.jpg)
IntegratedInforma)onRetrieval
• Manydatabasescontainlogicalrela)onsbetweenspecificentries.
• Oneinterface-connec)ngmanybiologicaldatabases.
• Forexample:adatabasethatconnectsbetweenproteinsequence,proteindomain,proteinstructureandreferencedatabases.(Interpro)
• Anotherexample:Connec)onbetweenreferences,proteinsequence,DNAsequence,andstructuredatabases.(Entrez)
![Page 44: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/44.jpg)
000003 breast cancer 1, early onset000002 breast cancer 1, early onset
000001 tumor protein p53
Chromosomal location: 17p13.1
DNA sequence:
mRNA sequence:
Protein function:
brain -liver -lung -
Protein sequence:
Interacts with genes:
Protein structure:
000365, 025783, 004674
PDB 1OLG, 1OLH, 1SAE
Fields
External links
Internal links
A Database
AccessionNumber
Entries
Slide provided by Dr. Vered Caspi
![Page 45: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/45.jpg)
CoreDataandAnnota)on
Databasesgenerallyhave(atleast)twotypesofdata:
Coredata:Thedatathedatabasewasgeneratedtoorganize
Annota)on:Extrainforma)onthatroundsoutourpictureofthecoredataForexampleinagenomedatabase,thesequenceisthecoredata,andtheloca)onofgenesistheannota)on
![Page 46: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/46.jpg)
DatabaseIssues
• Printedjournalsvs.databases
• Directsubmissiontodatabases(e.g.GenBank,PDB)
• Archivalvs.curateddatabases
• Databasesthatpublishexperimentalresultsoflargegenomiccenters.
• Publicvs.privatedatabases.
![Page 47: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/47.jpg)
ForExample:ClassificaSonofGenomicDatabases
Databasescope
InformaSonsource
InformaSontype
ManygenomesOneGenomeOneSubjectOneGene
Directsubmissionfromscien)ficcommunityScien)ficliteratureGenomecenter’sexperimentalresultsOtherdatabases
MappingSequence&annota)onProteinstructure&func)onVaria)onsCompara)vegenomicsgenenetworks
Slide provided by Dr. Vered Caspi
![Page 48: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/48.jpg)
UserInterface
• Databasesearch– freetext– field-specific– sequence-based
• Databaseoutput– text– graphics– dynamic
![Page 49: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/49.jpg)
DataFormats
Therearemanydataformatsusedforsequences(bothnucleicandaminoacid)
• FastaFormat• GenBankFormat• FastqFormat
• (EMBLFormat)
![Page 50: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/50.jpg)
FastaFormat
• Simplestformat
• Leastinforma)on
• Startswitha>andsequencenameononeline
• Thesequenceinplaintextfollows
![Page 51: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/51.jpg)
>OB2T2GTGACAACATGTACAGCTGTGAGCGGTGTAAGAAGCTGCGGAACGGAGTGAAGTACTGCAAAGTCCTGCGGTTGCCCGAGATCCTGTGCATTCACCTAAAGCGCTTTCGGCACGAGGTGATGTACTCATTCAAGATCAACAGCCACGTCTCCTTGCCCTCGAGGGGCTCGACCTGCGCCCCTTCCTTGCCAAGGAGTGCACATCCCAGATCACCACCTACGACCTCCTCTCGGTCATCTGCCACCACGGCACGGCAGGCA
![Page 52: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/52.jpg)
>TNRC_HUMAN P36941 (tumor necrosis factor c receptor)MLLPWATSAPGLAWGPLVLGLFGLLAASQPQAVPPYASENQTCRDQEKEYYEPQHRICCSRCPPGTYVSAKCSRIRDTVCATCAENSYNEHWNYLTICQLCRPCDPVMGLEEIAPCTSKRKTQCRCQPGMFCAAWALECTHCELLSDCPPGTEAELKDEVGKGNNHCVPCKAGHFQNTSSPSARCQPHTRCENQGLVEAAPGTAQSDTTCKNPLEPLPPEMSGTMLMLAVLLPLAFFLLLATVFSCIWKSHPSLCRKLGSLLKRRPQGEGPNPVAGSWEPPKAHPYFPDLVQPLLPISGDVSPVSTGLPAAPVLEAGVPQQQSPLDLTREPQLEPGEQSQVAHGTNGIHVTGGSMTITGNIYIYNGPVLGGPPGPGDLPATPEPPYPIPEEGDPGPPGLSTPHQEDGKAWHLAETEHCGATPSNRGPRNQFITHD>TNRC_MOUSE P50284 lymphotoxin-beta receptor precursorMRLPRASSPCGLAWGPLLLGLSGLLVASQPQLVPPYRIENQTCWDQDKEYYEPMHDVCCSRCPPGEFVFAVCSRSQDTVCKTCPHNSYNEHWNHLSTCQLCRPCDIVLGFEEVAPCTSDRKAECRCQPGMSCVYLDNECVHCEEERLVLCQPGTEAEVTDEIMDTDVNCVPCKPGHFQNTSSPRARCQPHTRCEIQGLVEAAPGTSYSDTICKNPPEPGAMLLLAILLSLVLFLLFTTVLACAWMRHPSLCRKLGTLLKRHPEGEESPPCPAPRADPHFPDLAEPLLPMSGDLSPSPAGPPTAPSLEEVVLQQQSPLVQARELEAEPGEHGQVAHGANGIHVTGGSVTVTGNIYIYNGPVLGGTRGPGDPPAPPEPPYPTPEEGAPGPSELSTPYQEDGKAWHLAETETLGCQDL>TNR1_RAT P22934 tumor necrosis factor receptor 1 precursor (p60)MGLPIVPGLLLSLVLLALLMGIHPSGVTGLVPSLGDREKRDNLCPQGKYAHPKNNSICCTKCHKGTYLVSDCPSPGQETVCEVCDKGTFTASQNHVRQCLSCKTCRKEMFQVEISPCKADMDTVCGCKKNQFQRYLSETHFQCVDCSPCFNGTVTIPCKEKQNTVCNCHAGFFLSGNECTPCSHCKKNQECMKLCLPPVANVTNPQDSGTAVLLPLVIFLGLCLLFFICISLLCRYPQWRPRVYSIICRDSAPVKEVEGEGIVTKPLTPASIPAFSPNPGFNPTLGFSTTPRFSHPVSSTPISPVFGPSNWHNFVPPVREVVPTQGADPLLYGSLNPVPIPAPVRKWEDVVAAQPQRLDTADPAMLYAVVDGVPPTRWKEFMRLLGLSEHEIERLELQNGRCLREAHYSMLEAWRRRTPRHEATLDVVGRVLCDMNLRGCLENIRETLESPAHSSTTHLPR
![Page 53: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/53.jpg)
KnownIssueswithFastaFormat
• Differentprogramstreattheheaderlinedifferently:
– Someread10characters,some30
– Somereadun)lthefirstspace
• Makesureyouhaveuniquenames!!!
• Headerlinesshouldbeunder80characters• Lengthofsequencelinecandiffer
![Page 54: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/54.jpg)
@SRR2976060.1 1 length=202NAAGCTCTCACCCATGGAGACCAAGGCGATTAGGGTTTTTCTCTTCGCTCTCCTCCT+SRR2976060.1 1 length=202#1=DDFFFHHHHHJJJEIJJJJJIJJJJFHGJIIJ9DHIIIJJJJGIIJJJGIIIJJ
FastqFormat
Fourlines:1–startswith@andisauniqueiden)fier2–theactualsequence3–startswitha+andcanhaveaniden)fieragain4–thequalityofthebases
![Page 55: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/55.jpg)
GenbankFormat
• Dividedintothreeparts:– Informa)onlines– Featuretable– Sequence
![Page 56: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/56.jpg)
![Page 57: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/57.jpg)
![Page 58: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/58.jpg)
EMBL sequence formatRN [2] RA Wirsel S.G.R., Leibinger W., Mendgen K.W.; RT "Genetic diversity of fungi associated with common reed (Phragmites RT australis)"; RL Unpublished. XX FH Key Location/Qualifiers FH FT source 1..581 FT /db_xref="taxon:112223" FT /organism="ascomycota sp. 4/97-9" FT /isolate="4/97-9"
![Page 59: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/59.jpg)
![Page 60: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/60.jpg)
![Page 61: Introduc)on to Bioinformacs - dors.weizmann.ac.ildors.weizmann.ac.il/course/introbioinfo/intro_for_course-19.pdf · The technical stuff The course is made up of one lecture and an](https://reader030.fdocuments.net/reader030/viewer/2022040700/5d50eb3388c993a53b8b68a4/html5/thumbnails/61.jpg)