How genomics and bioinformatics is transforming clinical ...
How to use the web for bioinformatics
-
Upload
keaton-petty -
Category
Documents
-
view
25 -
download
2
description
Transcript of How to use the web for bioinformatics
![Page 1: How to use the web for bioinformatics](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813176550346895d97ef30/html5/thumbnails/1.jpg)
How to use the web for bioinformatics
Molecular Technologies
Ethan [email protected]
274-4330 X 1171
http://www.q7.com/~ethan
![Page 2: How to use the web for bioinformatics](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813176550346895d97ef30/html5/thumbnails/2.jpg)
ObjectivesAt the end of this session you should be able to do all of the following using freely available tools on the world wide web:•Use Genbank or a similar database to find nucleic acid sequences of interest•Understand the parts of a Genbank entry•Use some of the databases at NCBI to find more information about a sequence.•Perform an alignment of several nucleic acid sequences•Find an arbitrary tool or database on the web.
![Page 4: How to use the web for bioinformatics](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813176550346895d97ef30/html5/thumbnails/4.jpg)
Outline
• What is Bioinformatics
• Sequence Databases– What does a Genbank Entry look like?
• Other NCBI databases
• Multiple Sequence Alignment
• New tools & Databases
![Page 5: How to use the web for bioinformatics](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813176550346895d97ef30/html5/thumbnails/5.jpg)
What is Bioinformatics?
Bioinformatics refers to the creation and advancement of algorithms, computational and statistical techniques, and theory to solve formal and practical problems posed by or inspired from the management and analysis of biological data (Wikipedia)
![Page 6: How to use the web for bioinformatics](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813176550346895d97ef30/html5/thumbnails/6.jpg)
What is Bioinformatics?(my working definition)
Anything done on a computer in which knowledge of biology is helpful.
or
Anything done in biology in which knowledge of computers is helpful.
![Page 7: How to use the web for bioinformatics](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813176550346895d97ef30/html5/thumbnails/7.jpg)
What sort of questions can Bioinformatics answer?
• Sequence analysis – Where are restriction sites?
– How does an RNA molecule fold?
– What changes can be made to a DNA sequence to get a new protein with specific functional changes?
• Computational evolutionary biology – How are two sequences related?
• Analysis of gene expression – Is this gene highly expressed in cancer cells?
![Page 8: How to use the web for bioinformatics](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813176550346895d97ef30/html5/thumbnails/8.jpg)
What sort of work is done in Bioinformatics?
• Measuring biodiversity – How diverse are individuals of a species?– Is it one species or two?
• Analysis of regulation – What does this drug do to expression of a gene?
• Analysis of mutations in cancer – What is different about these cancer cells as compared to
none cancer cells?• High-throughput image analysis
– How can we analyze the affects of 1000 different compounds on the location of a specific protein?
• And more!
![Page 9: How to use the web for bioinformatics](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813176550346895d97ef30/html5/thumbnails/9.jpg)
Sequence Databases
• NCBI databases – Nucleic acids, proteins, Literature, genomes, taxonomy, SNPs and more!
• EMBL – Nucleic acid, protein, structure, microarray data and more.
• DBJJ – Nucleic acid, protein. • SwissProt – Very well annotated protein database. • Many other general and specialized databases
exist.
![Page 10: How to use the web for bioinformatics](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813176550346895d97ef30/html5/thumbnails/10.jpg)
Sequences DatabasesNCBI/Genebank
Nation Center for Biotechnology Information (NCBI)
Sponsored and run by the US government.
Contains many different databases and huge amounts of information.
Most or all data is freely downloadable.
This one site is probably sufficient for all your Nucleic acid and Protein database needs!
![Page 11: How to use the web for bioinformatics](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813176550346895d97ef30/html5/thumbnails/11.jpg)
Sequences DatabasesEntrez
• Allows searching and access to NCBI databases.
![Page 12: How to use the web for bioinformatics](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813176550346895d97ef30/html5/thumbnails/12.jpg)
Sequences DatabasesSequence Records
• LOCUS Number Size Type Topology Division Date• DEFINITION - Name of the Sequence• ACCESSION - Unique Id number• VERSION - Other numbers which are associated• KEYWORDS • SOURCE – What was it isolated from • ORGANISM - More taxonomic detail• REFERENCE - Paper or papers about the sequence
– AUTHORS – TITLE – JOURNAL
• FEATURES - A complete list of all of the features of a sequence. Can be very extensive and useful!
• ORIGIN – The actual Sequence!http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=58533118
![Page 13: How to use the web for bioinformatics](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813176550346895d97ef30/html5/thumbnails/13.jpg)
Other NCBI databases
Online Mendelian Inheritance in Man (OMIM)
A catalog of human genes and genetic disorders with links to other NCBI databases, including sequence databases.
This is a good starting point if you want to get sequences for a specific disorder.
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=search&DB=omim&term=HFI
![Page 14: How to use the web for bioinformatics](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813176550346895d97ef30/html5/thumbnails/14.jpg)
Other NCBI databases
Gene Database
Gathers information about a single gene.
Exactly one entry per Gene.
A good place to dig deeper into a single gene or to reduce redundancy about a single gene.
![Page 15: How to use the web for bioinformatics](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813176550346895d97ef30/html5/thumbnails/15.jpg)
Other NCBI databases
• HomoloGeneGathers homologs from various species
• 3D DomainsProtein Structure collection
• TaxonomySpecies information
• Geo (Gene Expression Omnibus)A gene expression/molecular abundance repository
![Page 16: How to use the web for bioinformatics](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813176550346895d97ef30/html5/thumbnails/16.jpg)
General Utilities
• http://searchlauncher.bcm.tmc.edu/seq-util/seq-util.html – Translation
– Restriction Digestion
– Reformatting (alternately FASTA Formatter)
– Complement/Reverse
– Etc.
• http://www.promega.com/biomath/calc11.htm – Melting Temperature of an oligo.
![Page 17: How to use the web for bioinformatics](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813176550346895d97ef30/html5/thumbnails/17.jpg)
Database search by sequence similarity
Basic Local Alignment Search Tool (BLAST)
![Page 18: How to use the web for bioinformatics](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813176550346895d97ef30/html5/thumbnails/18.jpg)
Multiple Sequence Alignment
Many programs can align multiple sequences with each other to find the best fit for all.
This is generally more biologically meaningful for protein sequences since they are more highly conserved.
Clustal is the most common.
![Page 19: How to use the web for bioinformatics](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813176550346895d97ef30/html5/thumbnails/19.jpg)
Multiple Sequence AlignmentMEAGAYLNAIIFVLVATIIAVISRGLTRTEPCTIRITGESITVHACHIDSX ETIKALA MEAGAYLNAIIFVLVATIIAVISRGLTRTEPCTIRITGESITVHACHIDS...ETIKALA MEA..YLNAII.VLV.TIIAVIS..L.RTEPC.IkITGESITV.ACklDa.....I..L. MEAgaYLNAIIfVLVaTIIAVISrgLtRTEPCtIrITGESITVhAChiDsx etIkaLa
LK PLSLERLFQ LK.PLSLERLFQ ......L..... lk plsLerlfq
![Page 20: How to use the web for bioinformatics](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813176550346895d97ef30/html5/thumbnails/20.jpg)
New ToolsDevelopment of new tools and databases is
ongoing.Your needs will probably change over time. You can find new tools using
GoogleLists Nucleic Acids Research Annual Database issue
![Page 21: How to use the web for bioinformatics](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813176550346895d97ef30/html5/thumbnails/21.jpg)
Homework Assignments due next session
1. Find a entry of interest in OMIM (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM)
2. Find a Gene associated with that entry1. Click on the “links” link on the right
and choose “Gene”
![Page 22: How to use the web for bioinformatics](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813176550346895d97ef30/html5/thumbnails/22.jpg)
Homework 3. The Gene page has gathered scads of information about this one gene.
Find homologs in other species. From this page again choose “links” and go to Homologene
![Page 23: How to use the web for bioinformatics](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813176550346895d97ef30/html5/thumbnails/23.jpg)
Homework 1. Gather the protein sequences for
each homologous gene (or 5 of them if there are more than that). 1. Click “DownLoad” in
the homologene listing
2. Download everything with the default settings.
![Page 24: How to use the web for bioinformatics](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813176550346895d97ef30/html5/thumbnails/24.jpg)
Homework You will get a text file in “Fasta” format. Save it somewhere convenient.
![Page 25: How to use the web for bioinformatics](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813176550346895d97ef30/html5/thumbnails/25.jpg)
Homework Go to the Clustal server at
http://searchlauncher.bcm.tmc.edu/multi-align/multi-align.html
Paste your complete Fasta file contents into the input box and click submit.
This takes awhile, so be patient. You will get output that looks something like this.
![Page 26: How to use the web for bioinformatics](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813176550346895d97ef30/html5/thumbnails/26.jpg)
Homework At the bottom of the alignment file is the same results in “Fasta” format. Copy the
complete Fasta results and paste it into the input box at a BoxShade server (http://bioweb.pasteur.fr/seqanal/interfaces/boxshade.html)
![Page 27: How to use the web for bioinformatics](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813176550346895d97ef30/html5/thumbnails/27.jpg)
Homework Depending on the parameters chosen for BoxShade, you will see something like this.
Regions which are the same in all species are likely involved in function in some way.
![Page 28: How to use the web for bioinformatics](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813176550346895d97ef30/html5/thumbnails/28.jpg)
HomeworkAfter all that work, your boss comes to you ands says that sequence comparison is
obsolete! He wants you do structural alignments of these proteins. Figure out what a structural alignment is, find two different tools to find conserved 3D structures and choose which one you would use for this. Describe why this tool is preferable to the other.
NOTE: You do not need to actually do any structural alignments. Just find out how you would go about doing on if you had to.