Experimental Study of Context-Free Path Query …...The evaluated methods 1. Annotating the...
Transcript of Experimental Study of Context-Free Path Query …...The evaluated methods 1. Annotating the...
Experimental Study of Context-Free Path Query
Evaluation MethodsJochem Kuijpers
Fifth openCypher Implementers MeetingBerlin 2019
Introduction● MSc student CS & Eng. at TU/e● Academic internship at Neo4j● Supervised by:
George Fletcher Tobias LindaakerNikolay YakovetsTU/e Database Group Neo4j
● We implemented and evaluated four methods for computing context-free path query results
Context-Free GrammarsExample: the language of even-length palindromes of {a, b}* = { ε, a a, b b, a a a a, a b b a, b a a b, … }
A grammar that accepts this language:
S ⇒ a S aS ⇒ b S bS ⇒ ε
Context-Free GrammarsExample: the language of even-length palindromes of {a, b}* = { ε, a a, b b, a a a a, a b b a, b a a b, … }
A grammar that accepts this language:
S ⇒ a S aS ⇒ b S bS ⇒ ε
Example derivation of the string a b b a
Context-Free GrammarsExample: the language of even-length palindromes of {a, b}* = { ε, a a, b b, a a a a, a b b a, b a a b, … }
A grammar that accepts this language:
S ⇒ a S aS ⇒ b S bS ⇒ ε
Example derivation of the string a b b a
Context-Free GrammarsExample: the language of even-length palindromes of {a, b}* = { ε, a a, b b, a a a a, a b b a, b a a b, … }
A grammar that accepts this language:
S ⇒ a S aS ⇒ b S bS ⇒ ε
Example derivation of the string a b b a
Context-Free GrammarsExample: the language of even-length palindromes of {a, b}* = { ε, a a, b b, a a a a, a b b a, b a a b, … }
A grammar that accepts this language:
S ⇒ a S aS ⇒ b S bS ⇒ ε
Example derivation of the string a b b a
Context-Free GrammarsExample: the language of even-length palindromes of {a, b}* = { ε, a a, b b, a a a a, a b b a, b a a b, … }
A grammar that accepts this language:
S ⇒ a S aS ⇒ b S bS ⇒ ε
Example derivation of the string a b b a
Context-Free Path Query● A query is a context-free grammar
● Grammar where terminals are edge-labels
● Find paths whose edge labels are accepted by the grammar
Context-Free Path Query● Why?
● Increased expressiveness w.r.t. regular expressions (regular path query)
● Use-cases in ○ biological data analysis○ static code analysis○ …
Our work● We implemented four context-free path query evaluation methods
● Used Neo4j components○ Graph store (vertices and edges)○ PageCache
● Query evaluation is separately implemented on top of these components○ (not integrated into Cypher)
The evaluated methods1. Annotating the context-free grammar
Hellings, Jelle. "Path results for context-free grammar queries on graphs." arXiv preprint arXiv:1502.02242 (2015).
2. Matrix multiplication (GPGPU)Azimov, Rustam, and Semyon Grigorev. "Context-free path querying by matrix multiplication." Proceedings of the 1st ACM SIGMOD Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA). ACM, 2018.
3. Adapted GLR (Tomita) parserSantos, Fred C., Umberto S. Costa, and Martin A. Musicante. "A Bottom-Up Algorithm for Answering Context-Free Path Queries in Graph Databases." International Conference on Web Engineering. Springer, Cham, 2018.
4. Adapted Earley parserSevon, Petteri, and Lauri Eronen. "Subgraph queries by context-free grammars." Journal of Integrative Bioinformatics 5.2 (2008): 157-172.
Grammar in Chomsky Normal Form
S ⇒ A BA ⇒ aB ⇒ b
Annotate the grammar:
A[u,v] ⇔ there exists an A-path from u to v
1. Annotating the grammar
Grammar in Chomsky Normal Form
S ⇒ A BA ⇒ aB ⇒ b
Annotate the grammar:
A[1,4], A[2,1], A[3,4]
B[2,3], B[4,2]
1. Annotating the grammar
Grammar in Chomsky Normal Form
S ⇒ A BA ⇒ aB ⇒ b
Annotate the grammar:
A[1,4], A[2,1], A[3,4]
B[2,3], B[4,2]
S[1,2], S[3,2] ⇒ (1,2) and (3,2) are vertex pairs matching the grammar
1. Annotating the grammar
2. Matrix Multiplication● Relation matrix representation of
the annotated grammar method
● Each grammar non-terminal is stored in the matrix
● The step of combining X ⇒ Y Z is implemented as a “multiplication”
● Can be implemented on GPU
1 2 3 4
1 B A
2 A
3 A
4 B
2. Matrix Multiplication● Relation matrix representation of
the annotated grammar method
● Each grammar non-terminal is stored in the matrix
● The step of combining X ⇒ Y Z is implemented as a “multiplication”
● Can be implemented on GPU
1 2 3 4
1 S B A
2 A
3 S A
4 B
3. Adapted GLR (Tomita) parser● GLR is a generalization of LR parsers● Use context-free grammars to parse input strings
● Whenever the parser has multiple options, the parse state is duplicated and both options are tested separately
● If at least one of these options leads to acceptance, the input is accepted
● Has a data structure that reduces duplicate work
Adaptations for graph parsing instead of string parsing
● A separate parse state is initialized for each vertex● Consumes edges instead of string symbols
● Accepting states in w are backtraced to vertex v where parsing started○ Emits result (v,w)
● The data structure helps keep duplicate work low
● There are some conditions where this algorithm terminates too early○ Failing to produce some results
3. Adapted GLR (Tomita) parser
4. Subgraph Parsing● Similar to the previous method, this is a string parser (Earley parser)
adapted for graph input
● Upon acceptance at vertex v, backtracking is used to find all paths thataccept at v, and are added to a new graph.
● Query result is the induced subgraph of accepted paths!
● Termination problem○ This algorithm depends on a maximum length parameter to stop○ This makes it unsuitable for matching paths of arbitrary length○ Further: There exist conditions where it is missing results or returns no results at all
ResultsGrammar 1: S ⇒ A B C B ⇒ b B C ⇒ c C c-1 D ⇒ d
A ⇒ a a B ⇒ b C ⇒ DA ⇒ a-1 a-1
ResultsGrammar 2: S ⇒ a X a-1 X ⇒ b X b-1 X ⇒ d
X ⇒ c X c-1
ResultsHighly ambiguous grammar:
S ⇒ XX ⇒ X XX ⇒ aX ⇒ b
Tested on a small (a,b)-labeled graph of just 50 vertices
Method Time (s) Memory (MB)
GLR (list) 2,798.6 3.15
GLR (matrix) 372.0 2.36
Ann. Gram (relational) 0.7 0.31
Ann. Gram (arbitrary) 0.7 0.48
Ann. Gram (shortest) 3.7 1.55
Ann. Gram (all-path) 2.8 9.09
Matrix Multiplication 0.1 < 0.01
Conclusions● CFPQ evaluation is not real-time
○ For a graph of 15,000 vertices, run time typically exceeds 1 hour
● Requires large amounts of memory○ Grammar 2 at 5,000 vertices required multiple gigabytes of memory for most methods
● Annotating the grammar seems most promising ○ Robust, can handle ambiguous grammars well○ Many possible query semantics○ Running time: arbitrary path ≈ all-path
Future work● Specialized methods for more restrictive grammars could be much faster
● The annotated grammar and the matrix representation could serve as a path index or reachability index respectively
○ Related to path index work being done at Neo4j