Introduction to the BioMart API. BioMart APIs ● Biomart_plib - Objected Oriented Perl interface.

29
Introduction to the BioMart API

Transcript of Introduction to the BioMart API. BioMart APIs ● Biomart_plib - Objected Oriented Perl interface.

Page 1: Introduction to the BioMart API. BioMart APIs ● Biomart_plib - Objected Oriented Perl interface.

Introduction to theBioMart API

Page 2: Introduction to the BioMart API. BioMart APIs ● Biomart_plib - Objected Oriented Perl interface.

BioMart APIs

● Biomart_plib - Objected Oriented Perl interface

Page 3: Introduction to the BioMart API. BioMart APIs ● Biomart_plib - Objected Oriented Perl interface.

Biomart_plib

Architecture

● Object Oriented Perl Based API to BioMart

Datasets

● Uses XML configuration shared by all BioMart

Software

Page 4: Introduction to the BioMart API. BioMart APIs ● Biomart_plib - Objected Oriented Perl interface.

Query logic

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 5: Introduction to the BioMart API. BioMart APIs ● Biomart_plib - Objected Oriented Perl interface.

Configuration logic

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 6: Introduction to the BioMart API. BioMart APIs ● Biomart_plib - Objected Oriented Perl interface.

Initializing API script

my $confFile = "/home/user/martRegistryFile";my $initializer = BioMart::Initializer->new(‘registryFile’=>$confFile);my $registry = $initializer->getRegistry;

Optional Initializer parameters:• ‘action’ => ‘clean’ - replace the dataset configurations stored on

the local file-system with those from the database and build a new, clean registry object

• ‘action’ => ‘update’ - replace any file-system dataset configurations modified since the last retrieval with the database copies and build a new registry object

• Default behaviour with no action specified is to generate the registry object using the cached file-system configurations if they exist, otherwise retrieve them from the database.

Page 7: Introduction to the BioMart API. BioMart APIs ● Biomart_plib - Objected Oriented Perl interface.

Initializing API script

Optional Initializer parameters (cont)

• ‘mode’ => ‘lazyload’ - only keep a certain number of dataset

configurations in memory at once for low memory machines and

future scalability• Default behaviour with no mode specified is to keep all

configurations in memory.

Page 8: Introduction to the BioMart API. BioMart APIs ● Biomart_plib - Objected Oriented Perl interface.

Building Query

my $query = BioMart::Query->new(‘registry’ => $registry ‘virtualSchemaName’ => ‘default’);

$query->addAttribute('hsapiens_gene_ensembl','ensembl_gene_id');

or with optional virtualSchema and interface settings:$query->addAttribute('hsapiens_gene_ensembl','ensembl_gene_id’,

’default’,’default’);

$query->addFilter('hsapiens_gene_ensembl','chromosome_name',['1']);$query->addFilter('hsapiens_gene_ensembl','hgnc_symbol',['FGFR1','IL2','DERL3']);

Page 9: Introduction to the BioMart API. BioMart APIs ● Biomart_plib - Objected Oriented Perl interface.

Executing query and printing results

my $query_runner = BioMart::QueryRunner->new();

$query_runner->execute($query);

$query_runner->printResults;

Page 10: Introduction to the BioMart API. BioMart APIs ● Biomart_plib - Objected Oriented Perl interface.

Executing query and printing resultsPrint formatted header:

$query_runner->printHeader;

Print just first 20 results:

$query_runner->printResults(20);

Change the formatter from tab-separated default before execute the

query:

$query->formatter(‘FASTA’);

The formatter has to have a corresponding module in

lib/BioMart/Formatter implementing the FormatterI.pm interface (eg)

CSV, TXT, GTF, XLS etc

Page 11: Introduction to the BioMart API. BioMart APIs ● Biomart_plib - Objected Oriented Perl interface.

Multi dataset queries

my $query = BioMart::Query->new('registry'=>$registry, 'virtualSchemaName'=>'default');

$query->addAttribute('hsapiens_gene_ensembl','ensembl_gene_id');$query->addAttribute('hsapiens_gene_ensembl','ensembl_transcript_id');$query->addAttribute('mmusculus_gene_ensembl','ensembl_gene_id');$query->addAttribute('mmusculus_gene_ensembl','ensembl_transcript_id');

This is the equivalent of picking human as the main dataset in the web interface and mouse as the optional second dataset (ie) the human attributes appear first in the result table followed by the mouse attributes.

Note that BioMart queries are currently restricted to two datasets maximumfor performance reasons and query planning technical difficulties.

Page 12: Introduction to the BioMart API. BioMart APIs ● Biomart_plib - Objected Oriented Perl interface.

Web services type access

● To support GRID projects such as Taverna and other third party users who want to federate mart data without leaving a port to the database server openly accessible.

Page 13: Introduction to the BioMart API. BioMart APIs ● Biomart_plib - Objected Oriented Perl interface.

Web services type access

http://test.biomart.org/cgi-bin/martservice?query=

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE Query>

<Query virtualSchemaName = "defaultSchema">

<Dataset name = "hsapiens_gene_ensembl">

<Attribute name = ”ensembl_gene_id" />

<Attribute name = "chromosome_name" />

<ValueFilter name = "chromosome_name" value = "1"/>

</Dataset>

</Query>

Page 14: Introduction to the BioMart API. BioMart APIs ● Biomart_plib - Objected Oriented Perl interface.

Web services type access

Change format from default tab-separated format:

<Query virtualSchemaName = "defaultSchema” formatter = “CSV”>

<Dataset name = "hsapiens_gene_ensembl">

<Attribute name = ”ensembl_gene_id" />

<Attribute name = "chromosome_name" />

<ValueFilter name = "chromosome_name" value = "1"/>

</Dataset>

</Query>

Page 15: Introduction to the BioMart API. BioMart APIs ● Biomart_plib - Objected Oriented Perl interface.

Web services type access

Get count instead:

<Query virtualSchemaName = "defaultSchema” count=“1”>

<Dataset name = "hsapiens_gene_ensembl">

<Attribute name = ”ensembl_gene_id" />

<Attribute name = "chromosome_name" />

<ValueFilter name = "chromosome_name" value = "1"/>

</Dataset>

</Query>

Page 16: Introduction to the BioMart API. BioMart APIs ● Biomart_plib - Objected Oriented Perl interface.

Web services type accessMulti-dataset query:

<Query virtualSchemaName = "defaultSchema">

<Dataset name = "mmusculus_gene_ensembl">

<ValueFilter name = "chromosome_name" value = "1"/>

</Dataset>

<Dataset name = "hsapiens_gene_ensembl">

<Attribute name = ”ensembl_gene_id" />

<Attribute name = "chromosome_name" />

<ValueFilter name = "chromosome_name" value = "1"/>

</Dataset>

</Query>

Page 17: Introduction to the BioMart API. BioMart APIs ● Biomart_plib - Objected Oriented Perl interface.

Web services type access

(1) Recover the registry file:

http://test.biomart.org/cgi-bin/martservice?type=registry

(2) Recover the datasets available for a mart:

http://test.biomart.org/cgi-bin/martservice?type=datasets&virtualSchema=default&mart=ensembl

(3) Recover the filters available for a dataset:

http://test.biomart.org/cgi-bin/martservice?type=filters&virtualSchema=default&dataset=hsapiens_gene_ensembl

(4) Recover the attributes available for a dataset:

http://test.biomart.org/cgi-bin/martservice?type=attributes&virtualSchema=default&dataset=hsapiens_gene_ensembl

Page 18: Introduction to the BioMart API. BioMart APIs ● Biomart_plib - Objected Oriented Perl interface.

MartJ

● Java Interface to Biomart Datasets

● Uses XML configuration shared by all BioMart

Software

Page 19: Introduction to the BioMart API. BioMart APIs ● Biomart_plib - Objected Oriented Perl interface.

RegistryDSConfigAdaptor

import org.ensembl.mart.lib.config.RegistryDSConfigAdaptor;

URL confURL = null;

try {

confURL =

InputSourceUtil.getURLForString(“data/defaultMartRegistry.xml”);

} catch (MalformedURLException e) {

throw new ConfigurationException("Warning, could not load "

+ “data/defaultMartRegistry.xml”

+ " file\n");

}

RegistryDSConfigAdaptor adaptor =

new RegistryDSConfigAdaptor(confURL, false, false, false);

Page 20: Introduction to the BioMart API. BioMart APIs ● Biomart_plib - Objected Oriented Perl interface.

DatasetConfig

import org.ensembl.mart.lib.config.DatasetConfig;

DatasetConfig config =

adaptor.getDatasetConfigByDatasetInternalName(

"hsapiens_gene_ensembl",

"default"

);

Page 21: Introduction to the BioMart API. BioMart APIs ● Biomart_plib - Objected Oriented Perl interface.

Query

import org.ensembl.mart.lib.Query;

Query query = new Query();

//query needs some information from the DatasetConfig

query.setDataSource(config.getAdaptor().getDataSource());

query.setMainTables(config.getStarBases());

query.setPrimaryKeys(config.getPrimaryKeys());

Page 22: Introduction to the BioMart API. BioMart APIs ● Biomart_plib - Objected Oriented Perl interface.

FieldAttribute/AttributeDescription

Import org.ensembl.mart.lib.config.AttributeDescription;

import org.ensembl.mart.lib.FieldAttribute;

AttributeDescription adesc =

config.getAttributeDescriptionByInternalName("gene_stable_id");

query.addAttribute(new FieldAttribute( adesc.getField(),

adesc.getTableConstraint(),

adesc.getKey()

)

);

Page 23: Introduction to the BioMart API. BioMart APIs ● Biomart_plib - Objected Oriented Perl interface.

Filter/FilterDescription

There are three types of Filter that can be added to the query, both are

created using the attributes of a FilterDescription

A. BasicFilter

B. BooleanFilter (but watch for the two boolean 'flavors')

C. IDListFilter

Page 24: Introduction to the BioMart API. BioMart APIs ● Biomart_plib - Objected Oriented Perl interface.

FilterDescription

import org.ensembl.mart.lib.config.FilterDescription;

FilterDescription fdesc =

config.getFilterDescriptionByInternalName(“chr_name”);

Page 25: Introduction to the BioMart API. BioMart APIs ● Biomart_plib - Objected Oriented Perl interface.

BasicFilter

import org.ensembl.mart.lib.BasicFilter;

//The config system actually masks alot of complexity

//with regard to filters by requiring the internalName

//again when calling the getXXX methods

query.addFilter(new BasicFilter( fdesc.getField(name),

fdesc.getTableConstraint(name),

fdesc.getKey(name),

"=",

"22"

)

);

Page 26: Introduction to the BioMart API. BioMart APIs ● Biomart_plib - Objected Oriented Perl interface.

BooleanFilter

import org.ensembl.mart.lib.BooleanFilter;

//note there are different types of BooleanFilter

//"boolean" and "boolean_num"

if (fdesc.getType(name).equals("boolean"))

query.addFilter(new BooleanFilter( fdesc.getField(name),

fdesc.getTableConstraint(name),

fdesc.getKey(name),

BooleanFilter.isNULL

)

);

else //”boolean_num”

query.addFilter(new BooleanFilter( fdesc.getField(name),

fdesc.getTableConstraint(name),

fdesc.getKey(name),

BooleanFilter.isNotNULL_NUM

)

);

Page 27: Introduction to the BioMart API. BioMart APIs ● Biomart_plib - Objected Oriented Perl interface.

IDListFilter

import org.ensembl.mart.lib.IDListFilter;

String[] ids = new String[] { “ENSG00000146556.4”,

“ENSG00000197194.1”,

“ENSG00000197490.1”,

“ENSG00000177693.1”

};

query.addFilter(new IDListFilter( fdesc.getField(name),

fdesc.getTableConstraint(name),

fdesc.getKey(name),

ids

)

);

Page 28: Introduction to the BioMart API. BioMart APIs ● Biomart_plib - Objected Oriented Perl interface.

Engine

import org.ensembl.mart.lib.Engine;

import org.ensembl.mart.lib.FormatSpec;

Engine engine = new Engine();

engine.execute(

query,

new FormatSpec(FormatSpec.TABULATED, "\t"),

System.out

);

Page 29: Introduction to the BioMart API. BioMart APIs ● Biomart_plib - Objected Oriented Perl interface.

Future of MartJ

In the future, MartJ will be refactored to use the more flexible

Architecture that we developed for the perl based software.