1 / 30 Data Mining with BioMart .
-
Upload
jefferson-mothershead -
Category
Documents
-
view
220 -
download
0
Transcript of 1 / 30 Data Mining with BioMart .
![Page 1: 1 / 30 Data Mining with BioMart .](https://reader036.fdocuments.net/reader036/viewer/2022062404/551c4c85550346b1458b4b00/html5/thumbnails/1.jpg)
1 / 30
Data Mining with BioMartData Mining with BioMart
www.ensembl.org/biomart/martviewwww.biomart.org/biomart/martview
![Page 2: 1 / 30 Data Mining with BioMart .](https://reader036.fdocuments.net/reader036/viewer/2022062404/551c4c85550346b1458b4b00/html5/thumbnails/2.jpg)
2 / 30
What is BioMart?What is BioMart?
• A data export tool
• A quick table generator
• A web interface to mine Ensembl data
![Page 3: 1 / 30 Data Mining with BioMart .](https://reader036.fdocuments.net/reader036/viewer/2022062404/551c4c85550346b1458b4b00/html5/thumbnails/3.jpg)
3 / 30
BioMart- Data miningBioMart- Data mining
• BioMart is a search engine that can find multiple terms and put them into a table format.
• Such as: mouse gene (IDs), chromosome and base pair position
• No programming required!
![Page 4: 1 / 30 Data Mining with BioMart .](https://reader036.fdocuments.net/reader036/viewer/2022062404/551c4c85550346b1458b4b00/html5/thumbnails/4.jpg)
4 / 30
General or Specific Data-TablesGeneral or Specific Data-Tables
• All the genes for one species
• Or… only genes on one specific region of a chromosome
• Or… make BioMart select genes
(I.e. all transcripts that match a microarry probe set, GO term, or InterPro domain).
![Page 5: 1 / 30 Data Mining with BioMart .](https://reader036.fdocuments.net/reader036/viewer/2022062404/551c4c85550346b1458b4b00/html5/thumbnails/5.jpg)
5 / 30
ResultsResults
Tables or sequencesTables or sequences
![Page 6: 1 / 30 Data Mining with BioMart .](https://reader036.fdocuments.net/reader036/viewer/2022062404/551c4c85550346b1458b4b00/html5/thumbnails/6.jpg)
6 / 30
The First Step: Choose the The First Step: Choose the DatasetDataset
Dataset: Current Ensembl, Human genes
![Page 7: 1 / 30 Data Mining with BioMart .](https://reader036.fdocuments.net/reader036/viewer/2022062404/551c4c85550346b1458b4b00/html5/thumbnails/7.jpg)
7 / 30
The Second Step: FiltersThe Second Step: Filters
Filters: Define a gene set
![Page 8: 1 / 30 Data Mining with BioMart .](https://reader036.fdocuments.net/reader036/viewer/2022062404/551c4c85550346b1458b4b00/html5/thumbnails/8.jpg)
8 / 30
Attributes attach informationAttributes attach information
Attributes: Determine output columns
![Page 9: 1 / 30 Data Mining with BioMart .](https://reader036.fdocuments.net/reader036/viewer/2022062404/551c4c85550346b1458b4b00/html5/thumbnails/9.jpg)
9 / 30
QueryQuery
For the human CFTR gene, export the Entrez Gene ID(s) and matching Affy HG U133-PLUS-2 probeset(s)
![Page 10: 1 / 30 Data Mining with BioMart .](https://reader036.fdocuments.net/reader036/viewer/2022062404/551c4c85550346b1458b4b00/html5/thumbnails/10.jpg)
10 / 30
Query:Query:
For the human CFTR gene, export the Entrez Gene ID(s) and matching Affy HG U133-PLUS-2 probeset(s)
• In the query:
Filters: what we know
Attributes: what we want to know.
![Page 11: 1 / 30 Data Mining with BioMart .](https://reader036.fdocuments.net/reader036/viewer/2022062404/551c4c85550346b1458b4b00/html5/thumbnails/11.jpg)
11 / 30
Query:Query:
For the human CFTR gene, export the Entrez Gene ID(s) and matching Affy HG U133-PLUS-2 probeset(s)
• In the query:
Filters: what we know
Attributes: what we want to know.
![Page 12: 1 / 30 Data Mining with BioMart .](https://reader036.fdocuments.net/reader036/viewer/2022062404/551c4c85550346b1458b4b00/html5/thumbnails/12.jpg)
12 / 30
Query:Query:
For the human CFTR gene, export the Entrez Gene ID(s) and matching Affy HG U133-PLUS-2 probeset(s)
• In the query:
Filters: what we know
Attributes: what we want to know
![Page 13: 1 / 30 Data Mining with BioMart .](https://reader036.fdocuments.net/reader036/viewer/2022062404/551c4c85550346b1458b4b00/html5/thumbnails/13.jpg)
13 / 30
A Brief ExampleA Brief Example
Use the current Ensembl (archives are also available)
SelectHomo sapiens
genes
![Page 14: 1 / 30 Data Mining with BioMart .](https://reader036.fdocuments.net/reader036/viewer/2022062404/551c4c85550346b1458b4b00/html5/thumbnails/14.jpg)
14 / 30
Select the Genes with FiltersSelect the Genes with Filters
Expand the GENE panel to enter in the gene ID(s).
Expand the ‘GENE’ panel.
ClickFilters
![Page 15: 1 / 30 Data Mining with BioMart .](https://reader036.fdocuments.net/reader036/viewer/2022062404/551c4c85550346b1458b4b00/html5/thumbnails/15.jpg)
15 / 30
Filters (and Count)Filters (and Count)
Click “Count” to see if genes passed through your filters.
Change this to HGNC curated name. Enter “CFTR” in the box.
![Page 16: 1 / 30 Data Mining with BioMart .](https://reader036.fdocuments.net/reader036/viewer/2022062404/551c4c85550346b1458b4b00/html5/thumbnails/16.jpg)
16 / 30
Attributes (Output Options)Attributes (Output Options)
Click on ‘Attributes’
‘Attributes’ allows you to output information.
![Page 17: 1 / 30 Data Mining with BioMart .](https://reader036.fdocuments.net/reader036/viewer/2022062404/551c4c85550346b1458b4b00/html5/thumbnails/17.jpg)
17 / 30
Attributes (Output Options)Attributes (Output Options)
Select ‘EntrezGene ID’
![Page 18: 1 / 30 Data Mining with BioMart .](https://reader036.fdocuments.net/reader036/viewer/2022062404/551c4c85550346b1458b4b00/html5/thumbnails/18.jpg)
18 / 30
Attributes (Output Options)Attributes (Output Options)
Select the Affy Platform ‘HG U133-PLUS-2’ in the
‘Microarray’ section
![Page 19: 1 / 30 Data Mining with BioMart .](https://reader036.fdocuments.net/reader036/viewer/2022062404/551c4c85550346b1458b4b00/html5/thumbnails/19.jpg)
19 / 30
The Results Table - PreviewThe Results Table - Preview
For the full result table: click “Go” or View “ALL” rows.
![Page 20: 1 / 30 Data Mining with BioMart .](https://reader036.fdocuments.net/reader036/viewer/2022062404/551c4c85550346b1458b4b00/html5/thumbnails/20.jpg)
20 / 30
Full Result TableFull Result Table
Ensembl Gene ID for CFTR
Ensembl Transcript
IDs
EntrezGene ID
Affy HG probeset
![Page 21: 1 / 30 Data Mining with BioMart .](https://reader036.fdocuments.net/reader036/viewer/2022062404/551c4c85550346b1458b4b00/html5/thumbnails/21.jpg)
21 / 30
Other Export Options (Attributes)Other Export Options (Attributes) Sequences: UTRs, flanking sequences, cDNA
and peptides, etc
Gene IDs from Ensembl and external sources (MGI, Entrez, etc)
Microarray data
Protein Functions/descriptions (Interpro, GO)
Orthologous gene sets
SNP/ Variation Data
![Page 22: 1 / 30 Data Mining with BioMart .](https://reader036.fdocuments.net/reader036/viewer/2022062404/551c4c85550346b1458b4b00/html5/thumbnails/22.jpg)
22 / 30
BioMart around the BioMart around the world…world…
BioMart started at Ensembl…
To where has it travelled?
![Page 23: 1 / 30 Data Mining with BioMart .](https://reader036.fdocuments.net/reader036/viewer/2022062404/551c4c85550346b1458b4b00/html5/thumbnails/23.jpg)
23 / 30
Central PortalCentral Portal
www.biomart.org
![Page 24: 1 / 30 Data Mining with BioMart .](https://reader036.fdocuments.net/reader036/viewer/2022062404/551c4c85550346b1458b4b00/html5/thumbnails/24.jpg)
24 / 30
WormBase WormBase
![Page 25: 1 / 30 Data Mining with BioMart .](https://reader036.fdocuments.net/reader036/viewer/2022062404/551c4c85550346b1458b4b00/html5/thumbnails/25.jpg)
25 / 30
HapMapHapMap
Population frequencies
Inter- population comparisons
Gene annotation
![Page 26: 1 / 30 Data Mining with BioMart .](https://reader036.fdocuments.net/reader036/viewer/2022062404/551c4c85550346b1458b4b00/html5/thumbnails/26.jpg)
26 / 30
DictyBaseDictyBase
![Page 27: 1 / 30 Data Mining with BioMart .](https://reader036.fdocuments.net/reader036/viewer/2022062404/551c4c85550346b1458b4b00/html5/thumbnails/27.jpg)
27 / 30
GRAMENEGRAMENE
www.gramene.org
![Page 28: 1 / 30 Data Mining with BioMart .](https://reader036.fdocuments.net/reader036/viewer/2022062404/551c4c85550346b1458b4b00/html5/thumbnails/28.jpg)
28 / 30
The Potato CenterThe Potato Center
![Page 29: 1 / 30 Data Mining with BioMart .](https://reader036.fdocuments.net/reader036/viewer/2022062404/551c4c85550346b1458b4b00/html5/thumbnails/29.jpg)
29 / 30
How to Get ThereHow to Get Therehttp://www.biomart.org/biomart/martview
http://www.ensembl.org/biomart/martview
• Or click on ‘BioMart’ from Ensembl
![Page 30: 1 / 30 Data Mining with BioMart .](https://reader036.fdocuments.net/reader036/viewer/2022062404/551c4c85550346b1458b4b00/html5/thumbnails/30.jpg)
30 / 30
Worked ExampleWorked Example
• Follow the worked example on pg 26
• Then, do the exercises on pg 34 (answers on pg 37)
This module should do the following:• Show you how to export multiple data types from
Ensembl for gene IDs or chromosomal regions.
![Page 31: 1 / 30 Data Mining with BioMart .](https://reader036.fdocuments.net/reader036/viewer/2022062404/551c4c85550346b1458b4b00/html5/thumbnails/31.jpg)
31 / 30
Ensembl Core DatabasesEnsembl Core Databases
Relational Database• Normalised• Each data point stored only onceTherefore:• Quick updates• Minimal storage requirementsBut:• Many tables• Many joins for complicated queries• Slow for data mining applications
![Page 32: 1 / 30 Data Mining with BioMart .](https://reader036.fdocuments.net/reader036/viewer/2022062404/551c4c85550346b1458b4b00/html5/thumbnails/32.jpg)
32 / 30
Normalised SchemaNormalised Schema
gene_id gene.symbol
9970 SMAD1
1712 SMAD2
8240 SMAD3
1967 SMAD4
… …
gene_id transcript
9970 ENST00000302085
1712 ENST00000262160
1712 ENST00000356825
8240 ENST00000327367
1967 ENST00000342988
… …
gene_id stable_id
9970 ENSG00000170365
1712 ENSG00000175387
8240 ENSG00000166949
1967 ENSG00000141646
… …
![Page 33: 1 / 30 Data Mining with BioMart .](https://reader036.fdocuments.net/reader036/viewer/2022062404/551c4c85550346b1458b4b00/html5/thumbnails/33.jpg)
33 / 30
BioMart DatabaseBioMart Database
Data warehouse• De-normalised• Query-optimisedTherefore:• Fast and flexible• Ideal for data miningBut:• Tables with apparent “redundancy”• Needs rebuilding from scratch for every release
from normalised core databases
![Page 34: 1 / 30 Data Mining with BioMart .](https://reader036.fdocuments.net/reader036/viewer/2022062404/551c4c85550346b1458b4b00/html5/thumbnails/34.jpg)
34 / 30
De-Normalised SchemaDe-Normalised Schema
gene_id transcript_id gene.symbol
ENSG00000170365 ENST00000302085 SMAD1
ENSG00000175387 ENST00000262160 SMAD2
ENSG00000175387 ENST00000356825 SMAD2
ENSG00000166949 ENST00000327367 SMAD3
ENSG00000141646 ENST00000342988 SMAD4
… … …
![Page 35: 1 / 30 Data Mining with BioMart .](https://reader036.fdocuments.net/reader036/viewer/2022062404/551c4c85550346b1458b4b00/html5/thumbnails/35.jpg)
35 / 30
SPECIES
FOCUS
REGION
SNP
PROTEIN
HOMOLOGY
GENE
EXPRESSION
REFSEQ
INTERPRO
GO
SWISSPROT
EMBL
AFFYMETRIX
FASTA
FILE
EXCEL
TEXT
GTF
HTML
DATASET FILTER ATTRIBUTES
Information FlowInformation Flow
REGION
SNP
PROTEIN
HOMOLOGY
GENE
EXPRESSION