Literature Informatics August 20 , 2013
description
Transcript of Literature Informatics August 20 , 2013
Literature InformaticsAugust 20, 2013
Ansuman Chattopadhyay, PhDHead, Molecular Biology Information ServiceHealth Sciences Library SystemUniversity of [email protected]
http://www.hsls.pitt.edu/molbio
•1990-1996University of Nebraska-LincolnPhD. in BiochemistryProtein synthesis initiation in eukaryotic system
•1997-2001Vanderbilt University School of Medicine, NashvilleResearch FellowEpidermal Growth Factor (EGF) mediated signal transduction
•2001-2002Cellomics Inc., PittsburghKnowledge Engineer
•2002- 2006Information Specialist in Molecular Biology and GeneticsHSLS, University of Pittsburgh
Ansuman Chattopadhyay, [email protected]
•2006- PresentHead, Molecular Biology Information ServiceHSLS, University of Pittsburgh
http://www.hsls.pitt.edu/molbio
•1993-2001SUNY Upstate Medical University, SyracusePhD in NeuroscienceSpecificity of connectivity in the mammalian olfactory system
•2001-2005Yale University School of Medicine, New HavenPostdoctoral FellowDevelopment & regeneration in the mammalian olfactory system
•2004-2006Southern Connecticut State University, New HavenMasters in Library Science
•2006- 2007Johns Hopkins Medical Institutions, Welch Medical Library, BaltimoreBasic Science Librarian
Carrie L Iwema, PhD, MLS, AHIP
•2007- PresentUniversity of Pittsburgh, Health Sciences Library System, PittsburghInformation Specialist in Molecular Biology
http://www.hsls.pitt.edu/molbio
HSLS Molecular Biology Information Service
Workshops
Website
Software Licensing
Bioinformatics Consultations
http://www.hsls.pitt.edu/molbio
Today’s Agenda 10 am to 12 pm
Genes, Proteins and Literature Searching
1pm to 3 pm
NCBI Resources - Genetic Variations Databases
3pm to 3:30 pmQ A Session
Literature Informatics
http://www.hsls.pitt.edu/guides/genetics
Learn how to
.. find the most appropriate literature
.. mine the literature
.. manage your collected information
.. browse scientific papers
http://www.hsls.pitt.edu/molbio
Topics
Introduction Intuitive PubMed Search Next-Generation Literature Search
Tools Reference Management PDF Reader
http://www.hsls.pitt.edu/molbio
Literature Informatics
Comprehensive search: MESH term based PubMed Search PubMed special topics query
Next-generation literature search tools: GoPubMed, eTBlast Quertle HuGE Navigator Utopia Docs
http://www.hsls.pitt.edu/molbio
Introduction
Genomic achievements since the Human Genome Project
http://www.hsls.pitt.edu/molbio
Progress in Genomics1990 2003 2013
Time
Technology
6-8 year 3-4 months 2-3 days Time
1B 10-50 M 4-6 K
Cost Source: Eric Green; HGP10 Symposium
DNA Sequencing Cost
http://www.hsls.pitt.edu/molbio
Big DATA Biology
Single GeneSingle Protein
Single lab
Small Science
Multi-Gene System-wide
High-throughputMulti-Institution
Big Science
Growth of PubMed Citations
Lu et al. Database (Oxford). 2011: baq036
266 K
• Breast Cancer
104K • Schizophrenia
9.9K • BRCA1
67K . p53
Aug 20th 2013
http://www.hsls.pitt.edu/molbio
Searching PubMed
http://www.hsls.pitt.edu/molbio
Find published literature with statistical and numerical data on DENGUE OUTBREAKS in India.
What genes are reported to be associated with the disease SCHIZOPHRENIA?
http://www.hsls.pitt.edu/molbio
Citations: 20 millionJournals: 5200
Schizophrenia: 96,912Schizophrenia gene: 7382
Dengue outbreaks in India: 329 Dengue outbreaks statistics India: 21
http://www.hsls.pitt.edu/molbio
Challenges
Am I getting everything / the right things?
How to digest this?
http://www.hsls.pitt.edu/guides/genetics
Medical Subject Headings (MeSH)
http://www.hsls.pitt.edu/guides/genetics
The U.S. National Library of Medicine's controlled vocabulary (thesaurus)
Arranged in a hierarchical manner called the MeSH Tree Structures
Updated annually
MeSH Vocabulary Headings
over 24,000 representing concepts found in the biomedical literature (Body Weight, Kidney, Radioactive Waste)
Subheadings attached to headings to describe a specific aspect of a
concept (adverse effects , metabolism, diagnosis, therapy)
Supplementary Concept Records over 172,000 terms in a separate chemical thesaurus -
updated weekly (cordycepin , valspodar , tacrolimus binding protein 4)
Publication Types(Letter, Review, Randomized Controlled Trial)
http://www.hsls.pitt.edu/guides/genetics
MeSH Tree Structure A. Anatomy
B. OrganismsC. DiseasesD. Chemical and DrugsE. Analytical, Diagnostic and
Therapeutic Techniques and EquipmentF. Psychiatry and PsychologyG. Biological SciencesH. Physical SciencesI. Anthropology, Education,
Sociology and Social PhenomenaJ. Technology and Food and BeveragesK. Humanities L. Information Science M. Persons N. Health CareV. Publication Characteristics Z. Geographic Locations
http://www.hsls.pitt.edu/guides/genetics
MeSH Indexing
http://www.hsls.pitt.edu/guides/genetics
Source: NLM
Find published literature with statistical and numerical data on DENGUE OUTBREAKS in India.
PubMed Query Using MeSH http://www.ncbi.nlm.nih.gov/mesh
http://www.hsls.pitt.edu/molbio
http://www.hsls.pitt.edu/molbio
Find articles on “Dengue outbreaks in India” by searching PubMed using MeSH terms
Link to the video tutorial:http://media.hsls.pitt.edu/media/clres2705/mesh.swf
Resources•Mesh Browser : http://www.ncbi.nlm.nih.gov/mesh
•PubMed: http://www.ncbi.nlm.nih.gov/pubmed
PubMed Query Using MeSH
http://www.hsls.pitt.edu/guides/genetics
PubMed Query Using MeSH
http://www.hsls.pitt.edu/guides/genetics
Building a PubMed Query
http://www.hsls.pitt.edu/guides/genetics
Building a PubMed Query
http://www.hsls.pitt.edu/guides/genetics
Building a PubMed Query
http://www.hsls.pitt.edu/guides/genetics
Building a PubMed Query
http://www.hsls.pitt.edu/guides/genetics
Building a PubMed Query
http://www.hsls.pitt.edu/guides/genetics
Building PubMed QueriesTerm Boolean Term Boolean Term # papersDengue AND Outbreaks 823Dengue * AND Outbreaks 746Dengue AND Outbreaks AND India 123Dengue* AND Outbreaks AND India 116Dengue AND Outbreaks/
statistics and numerical data
AND India 7
Dengue* AND Outbreaks/statistics and numerical data
AND India 7
http://www.hsls.pitt.edu/guides/genetics
Useful Links for MeSH
MESH Browser: http://www.ncbi.nlm.nih.gov/mesh 18 ways to improve your Pubmed searches by Carrie Iwema
http://bitesizebio.com/2008/03/05/18-ways-to-improve-your-pubmed-searches/
Searching by using the MeSH Database. NCBI Handbook : http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=helppubmed&part
=pubmedhelp#pubmedhelp.Searching_by_using_t
http://www.hsls.pitt.edu/guides/genetics
What genes are reported to be associated with the disease SCHIZOPHRENIA?
Topic-Specific PubMed Querieshttp://www.nlm.nih.gov/bsd/special_queries.html
http://www.hsls.pitt.edu/guides/genetics
http://www.hsls.pitt.edu/molbio
Find genes that are reported to be associated with the disease SCHIZOPHRENIA by searching PubMed
Link to the video tutorial:http://media.hsls.pitt.edu/media/clres2705/scz.swf
Resources•PubMed Clinical Queries: http://www.ncbi.nlm.nih.gov/pubmed/clinical
PubMed Special Topic Queries
http://www.hsls.pitt.edu/molbio
PubMed Search Filter: Medical Genetics ("schizophrenia"[MeSH Terms] OR
"schizophrenia"[All Fields]) AND (("genetics, medical"[MeSH Terms] OR ("genetics"[All Fields] AND "medical"[All Fields]) OR "medical genetics"[All Fields] OR ("medical"[All Fields] AND "genetics"[All Fields])) OR ("genotype"[MeSH Terms] OR "genotype"[All Fields]) OR "genetics"[Subheading] AND ("genetics"[Subheading] OR "genetics"[All Fields] OR "genetics"[MeSH Terms]))
http://www.hsls.pitt.edu/molbio
Topic-Specific PubMed Queries
PubMed Search Result Display
http://www.hsls.pitt.edu/molbio
How to digestthis?
Data Mining-Knowledge Discovery
http://www.hsls.pitt.edu/molbiohttp://www.hsls.pitt.edu/molbio
Next-Generation Literature Search Tools
GoPubMedQuertle
Latest Innovations in Literature Searching
GoPubMed Display search results sorted into meaningful topics and subtopics
http://www.hsls.pitt.edu/molbio
GoPubMed
http://www.hsls.pitt.edu/molbio
www.gopubmed.com
http://www.hsls.pitt.edu/molbio
Find genes that are reported to be associated with the disease SCHIZOPHRENIA by using GoPubMed
Link to the video tutorial:http://media.hsls.pitt.edu/media/clres2705/gopubmed.swf
Resources• GoPubMed: http://www.gopubmed.org/web/gopubmed/2?WEB10O00h00100090000
GoPubMed Search Result Analysis
http://www.hsls.pitt.edu/molbio
GoPubMed Search Result Analysis
http://www.hsls.pitt.edu/molbio
Latest Innovations in Literature Searching
http://www.hsls.pitt.edu/molbio
GoPubMed
Noteworthy links
GoPubMed: exploring PubMed with the Gene Ontology. Doms A,Schroeder M., Nucleic Acids Res. 2005 Jul 1; 33 (Web Server issue):W783-6. http://www.ncbi.nlm.nih.gov/pubmed/15980585
http://www.hsls.pitt.edu/molbio
PubMed driven Web Tools
http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/search/
http://www.hsls.pitt.edu/molbio
PubMed based Tools
http://www.hsls.pitt.edu/molbio
Lu et al. Database (Oxford). 2011; 2011: baq036
http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/search/
Extract gene list from Literature
http://www.hsls.pitt.edu/molbio
http://www.quertle.info/
Questions for Quertle
What genes cause Asthma? What cell lines are used in diabetes research? Which cell types are known to express EGFR? What animals are used in studies for diabetes? Which protein kinases activate TP53?
http://www.hsls.pitt.edu/molbio
http://www.hsls.pitt.edu/molbio
A short video on Quertle
Link to the video tutorial:http://media.hsls.pitt.edu/media/molbiovideos/quertle-ac0212.swf
Resources•Quertle: www.quertle.info
Search Engine for NIH funded research http://projectreporter.nih.gov/reporter.cfm
http://www.hsls.pitt.edu/molbio
NIH Grant Applications to Gene List
http://www.hsls.pitt.edu/molbio
PubMed
MESH GoPubMed Quertle
Search Engine for finding
Disease Causing Genes
Search Engine Just for Human GeneticsCDC HuGENavigator : http://hugenavigator.net/
http://www.hsls.pitt.edu/molbio
http://www.hsls.pitt.edu/molbio
Find human genes reported to be associated with Asthma
Find human SNPs reported to be associated with Asthma
Link to the video tutorial:http://media.hsls.pitt.edu/media/clres2705/asthma.swf
Resources• HugeNavigator: http://hugenavigator.net/HuGENavigator/home.do
GWAS Catalog http://www.ebi.ac.uk/fgpt/gwas/#timeseriestab
http://www.hsls.pitt.edu/molbio
Search Engine Just for Human Genetics
http://www.hsls.pitt.edu/molbio
Search Engine Just for Human GeneticsCDC HuGENavigator : http://hugenavigator.net/
http://www.hsls.pitt.edu/molbio
Search Engine Just for Human Geneticshttp://hugenavigator.net/HuGENavigator/huGEPedia.do
http://www.hsls.pitt.edu/molbio
Search Engine Just for Human GeneticsCDC HuGENavigator : http://hugenavigator.net/
http://www.hsls.pitt.edu/molbio
Find Disease Causing SNPs
What SNPs are associated with “Schizophrenia”?
http://hugenavigator.net/HuGENavigator/gWAHitStartPage.do
http://www.hsls.pitt.edu/molbio
Hands On
Search PubMed and retrieve a list of genes that can serve as biomarkers for Alzheimer Disease
Software for Finding Similar Text in Published Literature
Text-based Similarity Search Tools
Search Box:
http://www.hsls.pitt.edu/molbio
Text Similarity Search Tools
eTBLASThttp://etest.vbi.vt.edu/etblast3/
http://www.hsls.pitt.edu/molbio
Text Similarity Search Tools eTBLAST
http://www.hsls.pitt.edu/molbio
Text Similarity Search Tools
http://www.hsls.pitt.edu/molbio
Déjà Vu: a Database of Highly Similar Citationshttp://dejavu.vbi.vt.edu/dejavu/duplicate/
http://www.hsls.pitt.edu/molbio
Automated email Notification Tool
http://www.ncbi.nlm.nih.gov/sites/myncbi/Save your searches at My NCBI and set up an email notification on new publication based on your search query
http://www.hsls.pitt.edu/molbio
Reference Management Tools
Connotea
CiteULike
Mendaley
Zotero
online
downloadable
Scientific Research Papers
Web pages
EndNoteRefworks
http://www.hsls.pitt.edu/molbio
PDF Reader
http://www.hsls.pitt.edu/molbio
PDF Reader
Utopia.Docs ReadCube
NextGen PDF Reader: Utopia docs
http://www.hsls.pitt.edu/molbio
Utopia DocsPMID: 22683712
http://www.hsls.pitt.edu/molbio
Read a paper using Utopia.Docs
Link to the video tutorial:http://www.hsls.pitt.edu/molbio/videos/play?v=95
Resources• Utopia.Docs: http://getutopia.com/
READCUBE
READCUBE MEDIA FILE
Mendeley Utopia Docs
Search Engine for Life Scientists
http://www.hsls.pitt.edu/molbio
Molecular Databases
Nucleic Acids Research : Annual databases Issue NAR: Annual Web Server Issue Oxford Journal : Bioinformatics BioMedCentral: BMC Bioinformatics
http://www.hsls.pitt.edu/molbio
Growth of bioinformatics tools
Biomedical & Life Sciences Search Engines
OBRC : University of Pittsburghhttp://www.hsls.pitt.edu/molbio/obrc
Bioinformatics.cahttp://bioinformatics.ca/links_directory/
OReFil : University of Tokyohttp://orefil.dbcls.jp/
http://www.hsls.pitt.edu/molbio
Peptide Sequence >nxp|NX_P00533-1|EGFR|Epidermal growth factor receptor|Iso 1
MRPSGTAGAALLALLAALCPASRALEEKKVCQGTSNKLTQLGTFEDHFLSLQRMFNNCEV VLGNLEITYVQRNYDLSFLKTIQEVAGYVLIALNTVERIPLENLQIIRGNMYYENSYALA VLSNYDANKTGLKELPMRNLQEILHGAVRFSNNPALCNVESIQWRDIVSSDFLSNMSMDF QNHLGSCQKCDPSCPNGSCWGAGEENCQKLTKIICAQQCSGRCRGKSPSDCCHNQCAAGC TGPRESDCLVCRKFRDEATCKDTCPPLMLYNPTTYQMDVNPEGKYSFGATCVKKCPRNYV VTDHGSCVRACGADSYEMEEDGVRKCKKCEGPCRKVCNGIGIGEFKDSLSINATNIKHFK NCTSISGDLHILPVAFRGDSFTHTPPLDPQELDILKTVKEITGFLLIQAWPENRTDLHAF ENLEIIRGRTKQHGQFSLAVVSLNITSLGLRSLKEISDGDVIISGNKNLCYANTINWKKL FGTSGQKTKIISNRGENSCKATGQVCHALCSPEGCWGPEPRDCVSCRNVSRGRECVDKCN LLEGEPREFVENSECIQCHPECLPQAMNITCTGRGPDNCIQCAHYIDGPHCVKTCPAGVM GENNTLVWKYADAGHVCHLCHPNCTYGCTGPGLEGCPTNGPKIPSIATGMVGALLLLLVV ALGIGLFMRRRHIVRKRTLRRLLQERELVEPLTPSGEAPNQALLRILKETEFKKIKVLGS GAFGTVYKGLWIPEGEKVKIPVAIKELREATSPKANKEILDEAYVMASVDNPHVCRLLGI CLTSTVQLITQLMPFGCLLDYVREHKDNIGSQYLLNWCVQIAKGMNYLEDRRLVHRDLAA RNVLVKTPQHVKITDFGLAKLLGAEEKEYHAEGGKVPIKWMALESILHRIYTHQSDVWSY GVTVWELMTFGSKPYDGIPASEISSILEKGERLPQPPICTIDVYMIMVKCWMIDADSRPK FRELIIEFSKMARDPQRYLVIQGDERMHLPSPTDSNFYRALMDEEDMDDVVDADEYLIPQ QGFFSSPSTSRTPLLSSLSATSNNSTVACIDRNGLQSCPIKEDSFLQRYSSDPTGALTED SIDDTFLPVPEYINQSVPKRPAGSVQNPVYHNQPLNPAPSRDPHYQDPHSTAVGNPEYLN TVQPTCVNSTFDSPAHWAQKGSHQISLDNPDYQQDFFPKEAKPNGIFKGSTAENAEYLRV
APQSSEFIGA
http://www.hsls.pitt.edu/molbio
Search for bioinformatics ResourcesLocate online tools that predict phosphorylation sites in a protein sequence.
Link to the video tutorial:http://media.hsls.pitt.edu/media/Search%20Bioinfo%20Resoursec.swf
Resources• Search.HSLS.MolBio :
Life Sciences Search Enginehttp://www.hsls.pitt.edu/molbio/
http://www.hsls.pitt.edu/molbio
Hands on
(a) Locate two online databases on microRNA targets, and report their URLs.
(b) Identify one database that store all HIV inhibiting siRNA data and cite its URL.
(c) Cite a paper offering a step-by-step guide
for analyzing ChIP-seq data.
Summary Construct a comprehensive PubMed query:
MESH Browser Retrieve gene/protein/disease information straight from a PubMed search :
GoPubmed, Quertle Find suitable journals for manuscript submission, etc… eTBLAST
Find disease causing genes and disease causing SNPs: HUGENavigator_ Geneprospector, Phenopedia, GWASIntegrator
Setup email alert for new publications My NCBI
Search for databases and software OBRC
http://www.hsls.pitt.edu/guides/genetics
Summary Setup email alert for new publications
My NCBI Search for databases and software
OBRC
http://www.hsls.pitt.edu/guides/genetics
Gene/Protein Information Mining
http://www.hsls.pitt.edu/guides/genetics
Bioinformatics Databases & Software Providers
National Center for Biotechnology Information (NCBI) Home page Site map Resource Guide
European Bioinformatics Institute (EBI) Home page Databases Software
http://www.hsls.pitt.edu/guides/genetics
Gene Information Gateways
o Open access resources:
National Center for Biotechnology Information (NCBI) Genbank Refseq
Entrez Gene Gene Expression Omnibus (GEO) OMIM
http://www.hsls.pitt.edu/guides/genetics
Protein Information Hubso Open access resources:
European Bioinformatics Institute (EBI)
Uniprot Interpro Prosite STRING
UCSC Genome Bioinformatics BLAT Search Gene Detail Page
http://www.hsls.pitt.edu/guides/genetics
Protein Information Hubso Open access resources:
National Center for Biotechnology Information (NCBI) Refseq Entrez Gene Conserved Domain Database (CDD) Molecular Modeling Database (MMDB) 3D structure viewer: Cn3D
http://www.hsls.pitt.edu/guides/genetics
Gene/Protein Information
Chromosomal location, mRNA, genomic seq, orthologs, paralogs, regulatory elements,
Amino acid seq, domain architecture, protein structure, post translational modifications
Gene expression, biological pathways, protein interaction map, disease association, biomarkers
http://www.hsls.pitt.edu/guides/genetics
Gene Questions ?
What is its function?
What are its neighboring genes?
What is its genomic seq?How many splice varients are there?What are its intron-exon architechure?
What diseases are associated with it?
Which tissues it expressed ?
How can I get its cDNA clone?
http://www.hsls.pitt.edu/guides/genetics
SNP
Genomic Sequence
Expression Profile
Interacting Partners3D Structure
mRNA Sequence
Chromosomal Localization
Disease
Amino acid Sequence
Homologous Sequences
http://www.hsls.pitt.edu/guides/genetics
NCBI : Entrez Gene
Entrez GeneFind: gene symbols and aliases sequences: genomic, mRNA, protein intron-exon architecture genomic context: neighboring and antisense
genes Interacting partners associated gene ontology terms: function,
cellular component and biological process
http://www.hsls.pitt.edu/guides/genetics
Entrez Gene a searchable database of genes, from RefSeq
genomes, and defined by sequence and/or located in the NCBI Map Viewer
Statistics Gene: 7974 organisms Genbank: 160,000 organisms
each record represents a single gene from a given organism
http://www.hsls.pitt.edu/guides/genetics
NCBI Sequence Databases
GenBank archival database of nucleotide sequences
from >160,000 organisms More info GenPept
conceptual translation of GenBank CDS Refseq
based on GenBank record, non-redundant expert verified databases of reference sequences
http://www.hsls.pitt.edu/guides/genetics
International Nucleotide Sequence Database Collaboration
http://www.hsls.pitt.edu/guides/genetics
Primary Vs Derivative databases
http://www.hsls.pitt.edu/guides/genetics
RefSeq Scope & Accessions
Genomic DNA NC_123456 - complete genome, complete
chromosome, complete plasmid NG_123456 - genomic region NT_123456 - genomic contig
mRNA - NM_123456 Protein - NP_123456
more about RefSeq scope and accessions...
http://www.hsls.pitt.edu/guides/genetics
RefSeq Status Codes
Provisional Reviewed Predicted Genome Annotation
more about RefSeq status codes
http://www.hsls.pitt.edu/guides/genetics
Hands on
Find mRNA sequence for your gene of interest (p53, BRCA1, EGFR, PLCg1)
Start page: Entrez core nucleotide Use Limits, History and Preview Index
http://www.hsls.pitt.edu/guides/genetics
Sequence Format
GenBank Header Features Sequence
FASTA Sequence
Example: U49845 Sample GenBank record Sequence Revision History tool
http://www.hsls.pitt.edu/guides/genetics
Video Tutorials
http://www.hsls.pitt.edu/molbio/videos?c=3
http://www.hsls.pitt.edu/guides/genetics
Find mRNA Sequence for Reelin Gene.
http://www.hsls.pitt.edu/guides/genetics
Gene FunctionWhat is its function?
Entrez Gene Page:
Summary (TOC)Gene OntologyGeneRIFsPathways (TOC)Biosystems (Links)
http://www.hsls.pitt.edu/guides/genetics
Gene Ontology (GO)
Controlled vocabulary tagging
• Function• Biological Processes• Cellular Component
http://www.hsls.pitt.edu/guides/genetics
Gene Ontology (GO) and KEGG GO
information page GO evidence codes
KEGG Information page
http://www.hsls.pitt.edu/guides/genetics
Function How many splice variants are there?What is/are its sequence?
Entrez Gene Page:
Genomic regions…(TOC)UCSC (Links)
http://www.hsls.pitt.edu/guides/genetics
Video Tutorials
Alternative Splicing
http://www.hsls.pitt.edu/guides/genetics
Intron-Exon CoordinatesWhat are its intron-exon architechure?
Entrez Gene Page:
DisplayChange it from Full report to Gene Table
http://www.hsls.pitt.edu/guides/genetics
Video Tutorials
Neighboring GenesWhat are its neighboring genes?
Entrez Gene Page:
Genomic context(TOC)
http://www.hsls.pitt.edu/guides/genetics
Video Tutorials
Chromosomal location
http://www.hsls.pitt.edu/guides/genetics
Associated DiseasesWhat diseases are associated with it? Entrez Gene Page:TOC
•General Information_Phenotype
LinksOMIMHuGE Navigator
http://www.hsls.pitt.edu/guides/genetics
Video Tutorials
HomologeneWhat are its homologous genes?
Entrez Gene Page:
LinkHomologenechange Display settings
http://www.hsls.pitt.edu/guides/genetics
Video Tutorials
ReagentsHow can I get its cDNA clone?
..antibodies? .. siRNA ?
Entrez Gene Page:
TOC:Additional LinksResearch MateriasExact Antigen
http://www.hsls.pitt.edu/guides/genetics
Video Tutorials
Protein Information Gateways
http://www.hsls.pitt.edu/guides/genetics
UniprotKB : Universal Protein Resource : a comprehensive, centralized protein
information resource Developed by a consortium:
European Bioinformatics Institute (EBI) the Swiss Institute of Bioinformatics (SIB) the Protein Information Resource (PIR) Comprised of:
--Swiss-Prot: biologist-curated annotation data--TrEMBL: computationally annotation data--PIR-International Protein Sequence Database (PIR-PSD): the most
comprehensive and expertly-curated protein sequence database in the public domain for over 20 years.
Funded by: NIH, NSF, the European Union and the Swiss Federal government
Link to Wiki, YouTube, Blogs and Tweets: http://www.kosmix.com/topic/uniprot?
Tutorial Video: http://www.youtube.com/watch?v=TCF3qWn7siI&feature=youtube_gdata
http://www.hsls.pitt.edu/guides/genetics
Protein Questions ?
http://www.hsls.pitt.edu/guides/genetics
What is its Function?Amino acid sequence?
… molecular wt? isoelectric point (PI)? …post translational modifications? … presence of domain/pattern/profile? … hydrophobicity? … homologous orthologs? Etc.
Structure? … secondary and tertiary?
Interaction Partner?
Uniprot Video Tutorial
http://www.hsls.pitt.edu/molbio/videos/play?v=19
http://www.hsls.pitt.edu/guides/genetics
Protein Function from UniprotKB Uniprot Search:
http://www.hsls.pitt.edu/guides/genetics
Look under: general annotation_Function, ontologies_keywords, geneontology
Protein Sequence
Uniprot
• Sequence annotations
• sequences
Gene• Genomic regions,
transcripts, and products
• ccds (consensus cds report)
UCSC
• Sequence and links
http://www.hsls.pitt.edu/guides/genetics
Protein Sequence Analysis
http://www.hsls.pitt.edu/guides/genetics
PTM
• Uniprot• Seq annt
• IPA• Modificatio-ns
and Regulation
PI/MW
• Uniprot
• Seq_Tool• Compute PI
Hydroph-obicity
• Uniprot
• Seq_Tool• ProtScale
Peptide Digest
• Uniprot• Seq_Tools• PeptideMass• PeptideCutter
Homologous Seq
• Entrez Gene• Homologene
Domain/pattern• Uniprot• Sequence
annotation• InterPro• Entrez gene• Conserved
Domain
Protein Domain Resources
Protein Domain Databases:
InterPro
http://www.hsls.pitt.edu/guides/genetics
Protein Domains Wikipedia:
A protein domain is a part of protein sequence and structure that can evolve, function, and exist independently of the rest of the protein chain. Each domain forms a compact three-dimensional structure and often can be independently stable and folded. Many proteins consist of several structural domains. One domain may appear in a variety of evolutionarily related proteins. Domains vary in length from between about 25 amino acids up to 500 amino acids in length. The shortest domains such as zinc fingers are stabilized by metal ions or disulfide bridges. Domains often form functional units, such as the calcium-binding EF hand domain of calmodulin.
http://www.hsls.pitt.edu/guides/genetics
Protein Domain: SH3 Src homology 3 domains; SH3 domains bind to proline-rich ligands
with moderate affinity and selectivity, preferentially to PxxP motifs; they play a role in the regulation of enzymes by intramolecular interactions, changing the subcellular localization of signal pathway
components and mediate multiprotein complex assemblies.
http://www.hsls.pitt.edu/guides/genetics
Protein Structure
Primary
Secondary
Tertiary
Quarternary
http://www.hsls.pitt.edu/guides/genetics
Useful links: http://www.kosmix.com/topic/protein_structure?
Taken from wikipedia
Protein Structure
http://www.hsls.pitt.edu/guides/genetics
NCBI
Finding Protein Structure
PDB
Entrez Structure
NCBI BLINK via Entrez Gene/Protein
http://www.hsls.pitt.edu/guides/genetics
Structure Databases and Viewer Databases:
RCSB Protein Data Bank (PDB) State University of New Jersey (Rutgers), the San Diego Supercomputer Center at the University of California San
Diego, the University of Wisconsin-Madison Link http://www.kosmix.com/topic/protein_data_bank?
MMDB NCBI's structure database is called MMDB (Molecular Modeling DataBase), and it is a
subset of three-dimensional structures obtained from the Protein Data Bank (PDB), excluding theoretical models..
Viewer: Cn3D :
a helper application for your web browser that allows you to view 3-dimensional structures from NCBI's Entrez retrieval service.
Rasmol: EBI First glance in j mol : A simple tool for macromolecular visualization. (More..)
http://www.hsls.pitt.edu/guides/genetics
Protein Structure
Search for the 3D structure of P53 Entrez structure
View the crystal structure of mouse p53 core domain (MMDB: 42987) or Crystal Structure Of A P53 Core Dimer Bound To Dna ( PDB:2GEQ)
http://www.hsls.pitt.edu/guides/genetics
Manipulating the Structure Viewer Window
Find Similar Structure: NCBI VAST
http://www.hsls.pitt.edu/guides/genetics
NCBI BLink
BLink ("BLAST Link") displays the results of BLAST searches that have been done for every protein sequence in the Entrez Proteins data domain.
To access it, follow the BLink link displayed beside any hit in the results of an Entrez Proteins search.
http://www.hsls.pitt.edu/guides/genetics
Hands-on Protein Structure
View the crystal structure of Chronophin (PDB entry: 2P69).
A variant of this protein with mutations in its amino acid sequence has been isolated. Can you predict any effect of its mutations into its function?
Hint: Find the amino acid residues which are in close contact (3.5 A) with PYRIDOXAL-5'-PHOSPHATE (PLP).
Label the amino acids and save the picture in PNG format. Learn more on Chronophin structure at:http://kb-dev.psi-structuralgenomics.org/KB/archives.jsp?pageshow=3
http://www.hsls.pitt.edu/guides/genetics
Hands-on Protein Structure of Chronophin
http://kb-dev.psi-structuralgenomics.org/KB/archives.jsp?pageshow=3
http://www.hsls.pitt.edu/guides/genetics
Sequence Alignment in Cn3D
NCBI
http://www.hsls.pitt.edu/guides/genetics
Hands-On Can you identify the human protein which contains a
short peptide sequence: GPDGMPVIYHGHTLTTKIKFSDVLHTIKE ?
What is its function? What is its calculated PI and molecular wt? Which region of this protein is most hydrophobic? Locate five experimentally verified S/T/Y phosphorylation sites present in this
protein. Find the homologous mouse and fruit fly orthologs of this human protein and
report the % protein identity it shares with these orthologs. How many protein domains are reported to be present in this human protein? Find the location of its largest domain.
http://www.hsls.pitt.edu/guides/genetics
Licensed Tools for Gene/Protein Information
http://www.hsls.pitt.edu/guides/genetics
HSLS Licensed Tools
BioBase Metacore Ingenuity IPA
http://www.hsls.pitt.edu/guides/genetics
Gene/Protein facts from Biobase
http://www.hsls.pitt.edu/guides/genetics
http://goo.gl/9wpwG
BioBase BioKnowledge Library
http://www.hsls.pitt.edu/guides/genetics
Protein Function from IPA
http://www.hsls.pitt.edu/guides/genetics