Data Mining in Ensembl with BioMart Giulietta Spudich.

33
Data Mining in Ensembl Data Mining in Ensembl with BioMart with BioMart Giulietta Spudich

Transcript of Data Mining in Ensembl with BioMart Giulietta Spudich.

Page 1: Data Mining in Ensembl with BioMart Giulietta Spudich.

Data Mining in Ensembl with Data Mining in Ensembl with BioMartBioMart

Giulietta Spudich

Page 2: Data Mining in Ensembl with BioMart Giulietta Spudich.

Simple Text-based Simple Text-based Search EngineSearch Engine

Page 3: Data Mining in Ensembl with BioMart Giulietta Spudich.

‘‘Mouse Gene’ Gives Us ResultsMouse Gene’ Gives Us Results

Page 4: Data Mining in Ensembl with BioMart Giulietta Spudich.

A More Complex Query is Not as UsefulA More Complex Query is Not as Useful

Page 5: Data Mining in Ensembl with BioMart Giulietta Spudich.

BioMart- Data miningBioMart- Data mining

• BioMart is a search engine that can find multiple terms and put them into a table format.

• Such as: mouse gene (IDs), chromosome and base pair position

• No programming required!

Page 6: Data Mining in Ensembl with BioMart Giulietta Spudich.

General or Specific Data-TablesGeneral or Specific Data-Tables

• All the genes for one species

• Or… only genes on one specific region of a chromosome

• Or… genes on one region of a chromosome associated with a disease

Page 7: Data Mining in Ensembl with BioMart Giulietta Spudich.

The First Step: Choose the The First Step: Choose the DatasetDataset

Page 8: Data Mining in Ensembl with BioMart Giulietta Spudich.

The Second Step: FiltersThe Second Step: Filters

Filters define which genes we are looking at.

Page 9: Data Mining in Ensembl with BioMart Giulietta Spudich.

Attributes attach informationAttributes attach information

Determine output columns with Attributes.

Page 10: Data Mining in Ensembl with BioMart Giulietta Spudich.

ResultsResults

Tables or sequencesTables or sequences

Page 11: Data Mining in Ensembl with BioMart Giulietta Spudich.

Query:Query:

• For all mouse genes on chromosome 10 that are protein coding, I would like to know the IDs in both Ensembl and MGI.Are there Illumina probes and GO IDs for these genes?

• In the query:Filters: what we knowAttributes: what we want to know.

Page 12: Data Mining in Ensembl with BioMart Giulietta Spudich.

Query:Query:

• For all mouse genes on chromosome 10 that are protein coding, I would like to know the IDs in both Ensembl and MGI.Are there Illumina probes and GO IDs for these genes?

• In the query:Filters: what we knowAttributes: what we want to know.

Page 13: Data Mining in Ensembl with BioMart Giulietta Spudich.

Query:Query:

• For all mouse genes on chromosome 10 that are protein coding, I would like to know the IDs in both Ensembl and MGI.Are there Illumina probes and GO IDs for these genes?

• In the query:Filters: what we knowAttributes: what we want to know.

Page 14: Data Mining in Ensembl with BioMart Giulietta Spudich.

A Brief ExampleA Brief Example

Change dataset tomouse

Mus musculus

Page 15: Data Mining in Ensembl with BioMart Giulietta Spudich.

Select the genes with FiltersSelect the genes with Filters

We are looking for mouse genes on chromosome 10 that are protein coding.

ClickFilters.

Expand the ‘REGION’

panel.

Page 16: Data Mining in Ensembl with BioMart Giulietta Spudich.

Filters (selecting the genes)Filters (selecting the genes)

Change this to chromosome 10

Page 17: Data Mining in Ensembl with BioMart Giulietta Spudich.

Filters (selecting the genes)Filters (selecting the genes)

Select ‘protein coding’ in the ‘GENE’ section.

Click on ‘Attributes’

Page 18: Data Mining in Ensembl with BioMart Giulietta Spudich.

We would like GO terms and IDs in MGI (the Mouse Genome Informatics site).

Attributes (Output Options)Attributes (Output Options)

Expand the ‘EXTERNAL’ panel for

non-Ensembl IDs.

Page 19: Data Mining in Ensembl with BioMart Giulietta Spudich.

Attributes (Output)Attributes (Output)

Scroll down to add ‘Illumina v1’ probes that map to these genes.

Click ‘Results’

Page 20: Data Mining in Ensembl with BioMart Giulietta Spudich.

‘Results’ shows Gene IDs, GO terms, and Illumina probes for all protein coding mouse

genes on chromosome 10.

The Results Table - PreviewThe Results Table - PreviewFor the full result table: click ‘Go’ or View ‘ALL’ rows.

Page 21: Data Mining in Ensembl with BioMart Giulietta Spudich.

Full Result TableFull Result TableEnsembl Gene and

Transcript IDsGO terms MGI

symbolIllumina probes

Page 22: Data Mining in Ensembl with BioMart Giulietta Spudich.

Original Query:Original Query:

• For all mouse genes on chromosome 10 that are protein coding, I would like to know the IDs in both Ensembl and MGI.Are there Illumina probes and GO IDs for these genes?

• In the query:Filters: what we knowAttributes: columns in the Result Table

Page 23: Data Mining in Ensembl with BioMart Giulietta Spudich.

Other Export Options (Attributes)Other Export Options (Attributes) Sequences: UTRs, flanking sequences, cDNA

and peptides, etc

Gene IDs from Ensembl and external sources (MGI, Entrez, etc)

Microarray data

Protein Functions/descriptions (Interpro, GO)

Orthologous gene sets

SNP/ Variation Data

Page 24: Data Mining in Ensembl with BioMart Giulietta Spudich.

BioMart Data SetsBioMart Data Sets

• Ensembl genes• Vega genes

• SNPs• Compara (homologues and alignments)

Page 25: Data Mining in Ensembl with BioMart Giulietta Spudich.

BioMart around the BioMart around the world…world…

BioMart started at Ensembl…

To where has it travelled?

Page 26: Data Mining in Ensembl with BioMart Giulietta Spudich.

Central ServerCentral Server

www.biomart.org

Page 27: Data Mining in Ensembl with BioMart Giulietta Spudich.

WormBase WormBase

Page 28: Data Mining in Ensembl with BioMart Giulietta Spudich.

HapMapHapMap

Population frequencies

Inter- population comparisons

Gene annotation

Page 29: Data Mining in Ensembl with BioMart Giulietta Spudich.

DictyBaseDictyBase

Page 30: Data Mining in Ensembl with BioMart Giulietta Spudich.

GRAMENEGRAMENE

Rice, Maize, Arabidopsis genomes…

Page 31: Data Mining in Ensembl with BioMart Giulietta Spudich.

How to Get ThereHow to Get Therehttp://www.biomart.org/biomart/martview

http://www.ensembl.org/biomart/martview

• Or click on ‘BioMart’ from Ensembl

Page 32: Data Mining in Ensembl with BioMart Giulietta Spudich.

• Choose Dataset (All genes for a species)

• Choose Filters (narrows the gene set)

• Choose Attributes (output options)

The FlowThe Flow

Page 33: Data Mining in Ensembl with BioMart Giulietta Spudich.

BioMart teamBioMart team

• Arek Kasprzyk• Syed Haider• Richard Holland• Damian Smedley