Other biological databases
Biological systems
Taxonomic data
Literature
Protein folding and 3D structure
Small molecules
Pathways and networks
Biological systems
Protein families and domains
Whole genome data
Sequence data
Ontologies -GO
Other Biological Databases
• Transcription factor binding sites -TRANSFAC
• Protein structure databases- PDB, SCOP, CATH
• Protein family databases- Pfam, Prints, PROSITE etc.
• Chemicals and small molecules -ChEBI
• Gene expression databases –GEO, ArrayExpress
• Metabolic pathways - Reactome, KEGG
• Genome Databases- Ensembl, FlyBase, WormBase etc.
• Human genetics-related databases –HapMap, dbSNP
Transcription factor binding sites
• TRANSFAC –database of eukaryotic transcription factors: http://www.gene-regulation.com/pub/databases.html#transfac
• TESS –Transcription Element Search System –for predicting transcription factor binding sites, uses TRANSFAC: http://www.cbi.upenn.edu/tess
• TFsearch –for searching transcription factor binding sites: http://www.cbrc.jp/research/db/TFSEARCH.html
Protein structure databases
• Main resource is Protein Data Bank (PDB): http://www.rcsb.org/pdb/
• Contains the spatial coordinates of macromolecule atoms whose 3D structure has been obtained by X-ray or NMR studies
• Proteins represent more than 90% of available structures (others are DNA, RNA, sugars, viruses, protein/DNA complexes…)
• Can search by PDB code
Searching MSD
http://www.ebi.ac.uk/msd -Search by PDB code
Protein structure-related databases
• Structural family databases based on PDB –SCOP (http://scop.mrc-lmb.cam.ac.uk/scop/) and CATH (http://www.biochem.ucl.ac.uk/bsm/cath/)
• Predicted structures in SWISS-MODEL (http://swissmodel.expasy.org//SWISS-MODEL.html)
Protein family databases
• Databases that produce signatures for identifying protein families or domains
• Used for functional classification of proteins
• E.g. Pfam, PROSITE, Prints, SMART, TIGRFAMs etc.
• Integrated into single resource InterPro (http://www.ebi.ac.uk/interpro)
InterProScan sequence search
Stand-alone version available
InterPro text search
Search keyword, protein acc or InterPro acc
Results for
protein acc
Example InterPro
entry
Chemicals and small molecules
• Chemical abstracts- http://www.cas.org/• ChEBI- http://www.ebi.ac.uk/chebi• KEGG –part of it includes chemicals
http://www.genome.jp/kegg • ChemID plus -chemicals cited in NLM databases
http://chem2.sis.nlm.nih.gov/chemidplus/chemidlite.jsp
• MSD-Chem –ligands and chemicals in MSD
CheBI example entry
Hierarchy for
chemicals
Gene expression databases
• NCBI Gene Expression Omnibus (GEO) http://www.ncbi.nlm.nih.gov/geo/
• ArrayExpress http://www.ncbi.nlm.nih.gov/geo/
• Stanford microarray database http://genome-www5.stanford.edu/
• Can usually search for experiments or particular expression profiles
GEO search page
Profiles search results
Specific entry and experiment info
ArrayExpress search results
What does the data look like?
• Info on experiment, array used, etc.
• Raw or processed tab delimited file containing spots and their intensities cy3/cy5 ratios) across different samples
• Files with meta data e.g. sample info, annotation and coordinates of each spot on array
Proteomics: SWISS-2DPAGE
Enzymes and metabolic pathways
• Contain information describing enzymes, biochemical reactions and metabolic pathways;
• ENZYME and BRENDA: nomenclature databases that store information on enzyme names and reactions;
• IntEnz: Integrated relational Enzyme database
Enzyme nomenclature• E.C. (Enzyme Commission) numbers assigned based
on reactions they catalyze
• Hierarchy, high level groups:– EC 1 –Oxidoreductases– EC 2 –Transferases– EC 3 –Hydrolases– EC 4 –Lyases– EC 5 –Isomerases– EC 6 –Ligases
EC example
Metabolic Pathway databases• PATHGUIDE >200 pathways• KEGG (Kyoto encyclopedia of genes and genomes):
http://www.genome.jp/kegg -includes:– Database of chemicals, genes and networks (metabolic,
regulatory etc.)– Well-curated and quite specific
• EcoCyc (Encyclopedia of E. coli K12 genes and metabolism): http://ecocyc.org –curation of entries genome
• Reactome –curated biological pathways: http://www.reactome.org/
• GenMAPP –pathways contributed by users
http://www.genome.ad.jp/kegg
Different pathway in different species: -> comparison
Pathway in Reactome
Example of a pathway in BioCyc
Protein-protein interaction databases
• Protein-protein interaction databases store pairwise interactions or complexes
• Can get 1 to more than 20,000 interactions per publication• IntAct http://www.ebi.ac.uk/intact • DIP (Database of Interacting Proteins) http://dip.doe-
mbi.ucla.edu/• BIND (Biomolecular Interaction Network Database)
http://submit.bind.ca:8080/bind/
Protein-protein interactions in IntAct
Integrated functional interactions in STRING
Genome browsers
• Integrate sequence & functional data for a genome• Ensembl –genome browser for major eukaryotic genomes,
e.g. human, mouse etc. http://www.ensembl.org• UCSC browser -http://genome.ucsc.edu/ • FlyBase –Drosophila genome database:
http://www.ebi.ac.uk/flybase• WormBase –C. elegans: http://www.wormbase.org• PlasmoDB –Plasmodium (malaria): http://plasmodb.org• Etc.
Ensembl genome browser
Ensembl gene view 1
Ensembl gene view 2
Gene within context on chromosome
Human genetics databases
• GeneCards (http://www.genecards.org/)
• HapMap (http://hapmap.ncbi.nlm.nih.gov/)
• OMIM http://www.ncbi.nlm.nih.gov/omim
• HGDP Human Genome Diversity Project (http://hagsc.org/hgdp/files.html)
Most of the databases are disease or gene centric i.e. p53
Mutation/polymorphism databases
dbSNPhttp://www.ncbi.nlm.nih.gov/SNP/
Repository of all known mutation (human and other organisms)
Where to find the databases
• Table of addresses for major databases and tools
• Nucleic Acids Research Database issue January each year
• Nucleic Acids Research Software issue –new
• Expasy list of tools: http://ca.expasy.org/links.html
Large scale data retrieval
• Programmatic access to many databases
• MySQL access to some
• BioMart access –public and private
• FTP sites –large data downloads
Other tutorials
• http://www.ensembl.org/info/website/tutorials/index.html
• http://www.ebi.ac.uk/training/online/
• http://www.ebi.ac.uk/2can/home.html
Top Related