PhyQL: A Phylogenetic Visual Query Engine
description
Transcript of PhyQL: A Phylogenetic Visual Query Engine
Shahriyar Hossain, Munirul Islam, Jesmin, Hasan M JamilIntegration Informatics Laboratory, Computer Science, Wayne State University
Department of Genetic Engineering and Biotechnology, University of Dhaka, Bangladesh
BIBM 200804/21/231
PhyQL: A Phylogenetic Visual Query Engine
Integration Informatics Research Group
04/21/23Integration Informatics Research Group2
What is a Phylogenetic Tree?
04/21/23Integration Informatics Research Group3
Queries:Least Common
Ancestor
Thurs 03/20/20084
<root> <node>rayfinned fish</node> <inode> <node>lungfish</node> <inode> <inode> <node>salamanders</node> <node>frogs</node> </inode> . . . </inode> </inode></root>
for $root in doc(“tree.xml")//root return <span> <h1> { $root/node/text() } </h1> </span>Integration Informatics Research Group
Phylogenetic Query Language:
Select: select a subset of trees that match a given criteria
Join: Join two trees based on a pair of nodesSubset: Subset queries retrieve part of a given tree
11/5/20085 Integration Informatics Research Group
04/21/236
Using Path Operators
SubTree Projection
Tree Join
Integration Informatics Research Group
PhyQL:
04/21/237
XSB
DB
Visual Query Interface
User
SELECT
JOIN
SUBTREE
Translator
XML /NEXUSFrom User /
Interoperable
Databases
Wrappers
Integration Informatics Research Group
Why XSB?eliminates left recursion problem
Path(X,Z) :- Path(X,Y), Edge(Y,Z)Stores intermediate results (by tabling method)Model-based (order of writing rules doesn’t matter)
Path(X,Y) :- edge(X,Y)Path(X,Y) :- Path(X,Y), edge(Y,Z)
its in-memory database queries are an order of magnitude faster than methods such as tuProlog.
11/5/2008Integration Informatics Research Group8
:- odbc_import(conn, 'tbl_treeinfo'(‘rootId', ‘author'), tree).:- odbc_import(conn, 'tbl_nodeinfo'('nodeId', 'nodename'), node).:- odbc_import(conn, 'tbl_edge'('parentId', 'childId'), edge).
04/21/239
<tree author="stern"> <node type=“*"> <node type=“?"> <node> Stanhopea_gibbosa </node> <node> Stanhopea_vasquezii </node> </node> <node> Stanhopea_shuttleworthii </node> </node></tree>
node(Y1, ‘Stanhopea_shuttleworthii’),node(Y2, ‘Stanhopea_gibbosa’),node(Y3, ‘Stanhopea_vasquezii),edge(Y4,Y2),edge(Y4,Y3),lca(Y0,Y4,Y1),edge(Y0,Y1)
Integration Informatics Research Group
04/21/2310 Integration Informatics Research Group
04/21/2311 Integration Informatics Research Group
04/21/23Integration Informatics Research Group12Integration Informatics Research Group
04/21/2313 Integration Informatics Research Group
SummaryPhyQL offers a simple web-based visual query
interfaceLogic based tree query operationsModifications to query tools only requires change in
logic rulesProposed architecture can also applied to protein-
protein interaction networks, metabolic pathways etc.
Future Work:Database Interoperability – allow retrieving integrate
phylogenetic data during query submission ReQuery – query on the result setTree Similarity Estimation
04/21/2314
Thank You!
04/21/2315 Integration Informatics Research Group
me: http://homopan.wayne.edu/PhD Students/Munirul Islam/index.htm
Uses of Phylogenetic Trees:1. date events of
divergence of species2. what is the most
common ancestor of all living species?
3. identify geographic origins of new disease outbreaks
11/5/2008Integration Informatics Research Group16
CrimsonUses nested subtrees to avoid long stringsZheng, Y. S. Fisher, S. Cohen, S. Guo, J. Kim, and
S. B. Davidson. 2006. Crimson: A Data Management System to Support Evaluating Phylogenetic Tree Reconstruction Algorithms. 32nd International Conference on Very Large Data Bases, ACM, pp. 1231-1234.
A B C D E
0.1
0.1.1 0.1.2
0.2
0.2.1
0.2.1.1 0.2.1.2 0.2.2
0
Dewey system:
Integration Informatics Research Group18 11/5/2008
Label Path
Root 0
NULL 0.1
A 0.1.1
B 0.1.2
NULL 0.2
NULL 0.2.1
C 0.2.1.1
D 0.2.1.2
E 0.2.2
A B C D E
Find clade for: Z = (<CS+Ds)
Find common pattern starting from left
SELECT * FROM nodesWHERE (path LIKE “0.2.1%”);
Integration Informatics Research Group19 11/5/2008
A B C D E
2
3 5
8
9
10 12 15
1
4 6
7
17
11 13 16
18
14
Depth-first traversal scoring each node with a left and right ID
Integration Informatics Research Group20 11/5/2008
Label Left Right
1 18
2 7
A 3 4
B 5 6
8 17
9 14
C 10 11
D 12 13
E 15 16
A B C D E
2
3 5
8
9
10 12 15
1
4 6
7
17
11 13 16
18
14
SELECT * FROM nodesINNER JOIN nodes AS includeON (nodes.left_id BETWEEN include.left_id AND include.right_id)WHERE include.node_id = 5 ;
Minimum Spanning Clade of Node 5
Integration Informatics Research Group21 11/5/2008