ACE & RACE annotation of complex/combinatorial expressions.

28
ACE & RACE annotation of complex/combinatorial expressions

Transcript of ACE & RACE annotation of complex/combinatorial expressions.

ACE & RACEannotation of

complex/combinatorialexpressions

Self-introduction

Andrey ZinovyevM.Sc. in theoretical physics (1997)

Ph.D. in computer science (2001),Method of elastic mapsand applications in bioinformatics

Programming, industrial informationsystems (C++, Delphi)

Web-services development (Java, JSP)

Senior postdoctoral fellow in IHES, Francehttp://www.ihes.fr/~zinovyev or type “zinovyev” in Google

Plan of the talk

ACE frameworkintroductionwhat we have

What will be in RACE?

ACE softwareC++ codeweb-application

Plans for ACE and RACE

Computational environment

Genome as databaseeverything is annotation

ATGCGTGCAAATGCTCTTTGTGTAACGTGTCGACGTACGTGTGTAACGTGCGACGTACGT

Genomes: human, chimp, mouse, rat

Gene annotation

Probabilityprofiles

TF1

TF2

b.a

c e

RNA structures

r.ace

Microarrays

m.a

ce

common format for annotation files (binary p-files)

Genome preprocessingcompile once, run everywhere

ATGCGTGCAAATGCTCTTTGTGTAACGTGTCGACGTACGTGTGTAACGTGCGACGTACGT

b.acePotential

TF binding sites

r.acePotential

RNA structures,splicing sites

m.aceGene

expressiondata

c.aceChromatinstructure

and dynamics

ace.annotate ace.RNAtoolsace.annotate

ace.map arc

ace.enhanceace.clusterace.displayace.dyCrace.stat

Structure space the truth is out there

set of annotations

Structure space

Multidimensionalcombinatorialspace of all possiblestructures appearingin a scanning window

ace.enhancebe more abstract

Accessing and masking structure space

ace.enhanceexpression (heuristic mask)

Method_01

Method_02

Method_11…

ace.enhanceannotation

view in genome browser (ace.display)

compare with experiment(cross-annotation)

(ace.dyCr)

construct more abstractspace and apply

ace.enhance further

b.ace

TF1

TF2

Transfac release

Genome release

b.ace~1.2Tbyte

ace.annotate

ace.enhanceEnhance methods:1. Fixed spacing of sites2. Fixed order of sites3. Fixed strand orientation of sites4. Multiple copies of site5. Minimal spacing of sites6. Maximal spacing of sites7. Variable, defined spacing

between sites8. Minimal p-value for weight matrix9. Maximal p-value for weight matrix10. Bias weight-matrix

M1&&M2||M3||M4||M5

… + ace.cluster:simplified version of enhance for detecting

clusters of repetitions of one motif

Example14 transcription factors, chr14 of UCSC_HG15

rarHS – 659.631 hitscMyb – 1.647.505 hitsCEBP – 1.189.196 hitsPU.1 – 472.383 hits

ace.annotate =>

ace.enhance expression, window 50bp:PU.1 && rarHS — rarHS || rarHS — rarHS && CEBP< cMyb

11**

8**

Result: 102 hits

5’ 3’

5’ 3’

5’ 3’

14.1

14.2

14.3

Example2clusters of motifs, chr14

jfl_im = TAGAGA

TAGAGTTAGGGATAGGGT

ace.annotate => 183.389hits

ace.enhance expression, window 300bp:jfl_im 10 copies

Result: 51 hits in 5 groups

ACE C++ tools

aceLib, wraps system-dependent code

generic programming for code reusability

ace.annotate – probability based annotations and motifs search

ace.enhance – accessing (masking) structure space: combinatorial query language

ace.cluster – extracting clusters of repetitions:simplified version of enhance

ace.dyCr – first step in structure space analysis:dynamic cross-annotation

ace.stat – statistical significance analysis

ACE web-application (JSP)

ace.uit

database layout: .ace

modules layout: ace.rte/ace.annotate

modules layout: ace.rte/ace.enhance

data layout: my.ace

documentation layout: ace.doc

Plans with ACEprincipal problem

false-positive rate

ace.stat : statistical model of random noise maximum entropy principle significance analysis

Plans with ACEvisualizing structure space

creating 2D maps of structure space

data visualization,dimension reduction

ace.evaace.net

Plans with ACEintegrating m.ace

m.ace

ace.map

Plans with ACEmodel of chromatin structure and dynamics

c.aceimunoprecipitation

experiments

chromatinstate

profiles

silencingstructures in space

arc

Plans with ACEcomparative genomics

genome1 genome2

Installation of b.ace in Lillehttp://ace.ibl.fr

1.2 Tbyte PowerVault storagePowerEdge Dell server

Installation of RACE in Sherbrooke (golf)

UCSClocal UCSC

browser

Gbrowser

LISADB ace

r.aceDB

r.aceDB

Distributed environmentdatabase synchronization protocol

b.aceLille

France

public dbs

new genomerelease

where?

r.aceSherbrooke

Canada

m.aceINSERM

Paris

c.aceIHESParis

LISASherbrooke

Canada

RACE platform for integration

ace.annotate find simple motifs (loops, hairpins)

ace.RNAtools pluggable algorithms

p-files (r.ace database)

ace.enhancepluggable methods

ace.display

ace.stat

ace.dyCr

ACE team

aceLib, ace C++: Thomas Bücher, Inst.Neur.

arc : Graham Smith, IHES

ace.map : Sebastian Noth, INSERM

ace team leader : Arndt Benecke, IHES

ace.stat : Richard Madden, UdSh

ace.uit, ace C++: Andrey Zinovyev, IHES