Data Mining in Ensembl with EnsMart. 2 of 24 All genes from a candidate region Genes with a...
-
date post
20-Dec-2015 -
Category
Documents
-
view
213 -
download
0
Transcript of Data Mining in Ensembl with EnsMart. 2 of 24 All genes from a candidate region Genes with a...
2 of 24
• All genes from a candidate region
• Genes with a particular protein domain
• Members of a protein family
• Genes associated with SNPs
Possible queries…Possible queries…
3 of 24
Specific queriesSpecific queries
• Disease related genes between markers D10S255 and D10S259
• Transmembrane proteins with an Ig-MHC domain (IPR003006) on chromosome 2
• Genes with associated coding SNPs on chromosomal band 5q35.3
• Mouse homologues for human disease genes.
4 of 24
• Human genes with upstream regions conserved w.r.t. mouse
• Upstream sequence for all Ensembl genes mapped to U95A chip (similarly, complete genomic annotation of MG_U74).
• Genomic location and description of all mouse, rat and fugu homologues of all human genes, with transmembrane domains, expressed in cardiovascular system and have non-synonymous SNPs.
More specific queriesMore specific queries
5 of 24
EnsMart – vertical and EnsMart – vertical and horizontal data integrationhorizontal data integration
Ensembl Genes
EST Genes
Vega Genes
SNPs
ZebrafishHuman Mouse Anopheles FuguRat
6 of 24
Genes
EST
Markers
Diseases
Protein Annotation
SNPs
Homology
Expression
Ensembl data setsEnsembl data sets
7 of 24
• Data retrieval tool
• Query builder interface
• Gene or SNP lists
• Associated features or sequences
• Various output formats
EnsMartEnsMart
8 of 24
SPECIES
FOCUS
REGION
SNP
PROTEIN
HOMOLOGY
GENE
EXPRESSION
REFSEQ
INTERPRO
GO
SWISSPROT
EMBL
AFFY
REGION
SNP
PROTEIN
HOMOLOGY
GENE
EXPRESSION
FASTA
FILE
EXCEL
TEXT
GTF
HTML
start filter outputInformation flowInformation flow
16 of 24
• Normalised
• Each data point stored only once
• Quick updates
• Minimal storage requirements
But:• Many tables
• Many joins for complicated queries
• Slow for data mining questions
Ensembl core databaseEnsembl core database