C-DEM: A Multi-Modal Query System for Drosophila Embryo Databases

10
C-DEM: A Multi-Modal Query System for Drosophila Embryo Databases Fan Guo, Lei Li, Eric Xing, Christos Faloutsos Carnegie Mellon University {fanguo, leili, epxing, [email protected]} 1 http://www.db.cs.cmu.edu:8080/cdem/demo.html

description

C-DEM: A Multi-Modal Query System for Drosophila Embryo Databases. Fan Guo , Lei Li, Eric Xing, Christos Faloutsos Carnegie Mellon University {fanguo, leili, epxing, [email protected]}. http://www.db.cs.cmu.edu:8080/cdem/demo.html. Background. Fruit-fly development in genetic study: - PowerPoint PPT Presentation

Transcript of C-DEM: A Multi-Modal Query System for Drosophila Embryo Databases

Page 1: C-DEM: A Multi-Modal Query System for Drosophila Embryo Databases

C-DEM: A Multi-Modal Query System for Drosophila Embryo Databases

Fan Guo, Lei Li, Eric Xing, Christos FaloutsosCarnegie Mellon University

{fanguo, leili, epxing, [email protected]}

1http://www.db.cs.cmu.edu:8080/cdem/demo.html

Page 2: C-DEM: A Multi-Modal Query System for Drosophila Embryo Databases

Background

• Fruit-fly development in genetic study:– Genes controlling the body plan and patterning organs are

similar to higher animals including human.

• Objective: a framework for applying data mining techniques to assist biological research.

2

Page 3: C-DEM: A Multi-Modal Query System for Drosophila Embryo Databases

The Graph Representation

3

Images

Genes

Keywords

• Image-layer edges: nearest neighbors in feature space

embryonic hindgut

Page 4: C-DEM: A Multi-Modal Query System for Drosophila Embryo Databases

Proximity Measure

• Random Walk with Restart– Starting from a node s;– Randomly walk to a neighbor,

with probability 1-c;– Restart at s, with probability c;– Compute the steady-state

probability vector.– Complexity:

O(E), but faster methods exist (Tong et al., ICDM’06)

4

Page 5: C-DEM: A Multi-Modal Query System for Drosophila Embryo Databases

• Random Walk with Restart– Starting from a node s– Randomly walk to a neighbor, with probability 1-c– Restart at s, with probability c

Proximity Measure

Page 6: C-DEM: A Multi-Modal Query System for Drosophila Embryo Databases

• Computing the Steady-State Probability

Proximity Measure

Desired probability vector

Adjacency matrix Vector w/ non-zero entry for restart nodes

Complexity: O(E), but faster methods exist (Tong et al., ICDM’06)

Page 7: C-DEM: A Multi-Modal Query System for Drosophila Embryo Databases

Multi-Modal Query Results

7

2D Expression Images

GenesAnnotation Terms

Page 8: C-DEM: A Multi-Modal Query System for Drosophila Embryo Databases

More Mining Tasks• Image Auto-Caption• Gene function identification

8

Page 9: C-DEM: A Multi-Modal Query System for Drosophila Embryo Databases

Related Work• Berkeley Drosophila Genome Project (www.fruitfly.org)

• FlyExpress (www.flyexpress.net)

• Berkeley Drosophila Transcription Network Project(bdtnp.lbl.gov)

9

Page 10: C-DEM: A Multi-Modal Query System for Drosophila Embryo Databases

System Architecture

10

Browser-based UI

Tomcat Web ServerJSP Application

Computing Engine

Queries Result Pages

ResultsRemote Function

Calls

HTTP

RMI