Using ArrayExpress. ArrayExpress is an international public repository for well-annotated microarray...

Post on 22-Dec-2015

232 views 1 download

Tags:

Transcript of Using ArrayExpress. ArrayExpress is an international public repository for well-annotated microarray...

Using ArrayExpress

ArrayExpress is an international public repository

for well-annotated microarray data, including gene

expression, comparative genomic hybridization (CGH) and chromatin-immunoprecipitation (ChI

P) experiments.

ArrayExpress http://www.ebi.ac.uk/microarray-as/aer/index.html#ae-main[0]

ArrayExpress has three major goalsArrayExpress has three major goals ::

1.Serve the scientific community as a repository for data supporting publications

2.Provide easy access to high-quality data in a standard format.

3.Facilitate the sharing of microarray designs and experimental protocols.

1. ArrayExpress experiment repository – the main database containing complete data supporting publications.

2. ArrayExpress gene expression profile data warehouse – contains gene-indexed expression profiles from a curated subset of experiments from the repository.

ArrayExpress has two major componentsArrayExpress has two major components ::

Search for experiments by entering ArrayExpress experiment accession numbers or keywords (e.g. RNAi, breast cancer) in the query box on the left-hand panel.

Options for sorting and filtering your results.

ID - the unique ArrayExpress accession number of the experiment.

Experiment accession numbers are in the format of E-XXXX-n, where XXXX is a code for the source of the data.

Experiments and array designs in ArrayExpress are given unique accession numbers in the format ofE-XXXX-n for experiments A-XXXX-n for array designs

XXXX represents a four letter code and n is a number e.g. E-MEXP-568, A-UHNC-18.

Title - the curated title for the experiment

Hybs - the total number of hybridizations in the experiment

Species - the species of the samples used (can be multiple)

Date - the date that the data were loaded into ArrayExpress

Processed – direct link to the processed data as a zip file (brown icon indicates that this exists)

Raw – a direct link to the raw data (brown/grey icon indicates that this exists/not exists). A wedge shaped icon indicates Affymetrix .CEL files

More – a link to the ArrayExpress advanced interface where you can get subsets of each data file by gene, hybridization and QuantitationTypes (columns in the data file).

Click anywhere on an experiment row and it will expand to allow you see more details about this experiment and see where the term you searched for appears.

Title - curated title of the experiment

MIAME score - this is a score to indicate how close to full MIAME-compliance an experiment is, with a score of 5 being the highest. One point each is given for •sufficient annotation of the associated array design •essential sample annotation including at least one experimental factor and the species of all samples •raw data files for each hybridization •final processed (normalized) data for the hybridizations in the experiment •essential laboratory and data processing protocols

Sample annotation – a link to .2columns.xls which is a file containing a list of the samples, the experimental factor values associated with these samples and the corresponding data files

Array – the ArrayExpress accession number(s) for the array design(s) used in the experiment. Clicking on the accession number opens a new browser window showing more information about the array design in the advanced query interface.

Downloads – links to the FTP server directory containing data files and sample and hybridization information for the experiment, and to the data retrieval page for the experiment in the advanced user interface

Experiment design – links to a diagram of the sample relationships in .png and .svg format.

Protocols – there is a link taking you to a page listing all the protocols used in the experiment.

Citation - details about any publications that relate to the data, including links to the online article and to the PubMed entry where available

Detailed sample annotation - a link to .sdrf.xls which contains information about the samples, the relationships between the samples, extracts, labeled extracts, hybridizations and data files.

Contact - the name of the experiment submitter

Design types - terms describing design types of the experiment. These can include biological, methodological and technology types e.g. disease state, strain or line, compound treatment, in-vivo, dye swap, co-expression, binding site identification.

Description - the description of the experiment as supplied by the submitter

Factor values - a list of the experimental factor values in the experiment

The four letter code in the accession number generally indicates the source of the MAGE-ML file that was used to load the data into the ArrayExpress database. Sources include our own submission tools (MEXP for MIAMExpress and TABM for Tab2MAGE) as well as MAGE-ML submitted from other organizations or microarray data management tools. The 4 letter code does not necessarily tell you which organization performed the experiment or manufactured the array design. Some experiments have also been extracted from the Gene Expression Omnibus (GEO) at the NCBI.

MIAME describes the Minimum Information About a Microarray Experiment that is needed to enable the interpretation of the results of the experiment unambiguously and potentially to reproduce the experiment.