Experimental Study of Context-Free Path Query …...The evaluated methods 1. Annotating the...

25
Experimental Study of Context-Free Path Query Evaluation Methods Jochem Kuijpers Fifth openCypher Implementers Meeting Berlin 2019

Transcript of Experimental Study of Context-Free Path Query …...The evaluated methods 1. Annotating the...

Page 1: Experimental Study of Context-Free Path Query …...The evaluated methods 1. Annotating the context-free grammar Hellings, Jelle. "Path results for context-free grammar queries on

Experimental Study of Context-Free Path Query

Evaluation MethodsJochem Kuijpers

Fifth openCypher Implementers MeetingBerlin 2019

Page 2: Experimental Study of Context-Free Path Query …...The evaluated methods 1. Annotating the context-free grammar Hellings, Jelle. "Path results for context-free grammar queries on

Introduction● MSc student CS & Eng. at TU/e● Academic internship at Neo4j● Supervised by:

George Fletcher Tobias LindaakerNikolay YakovetsTU/e Database Group Neo4j

● We implemented and evaluated four methods for computing context-free path query results

Page 3: Experimental Study of Context-Free Path Query …...The evaluated methods 1. Annotating the context-free grammar Hellings, Jelle. "Path results for context-free grammar queries on

Context-Free GrammarsExample: the language of even-length palindromes of {a, b}* = { ε, a a, b b, a a a a, a b b a, b a a b, … }

A grammar that accepts this language:

S ⇒ a S aS ⇒ b S bS ⇒ ε

Page 4: Experimental Study of Context-Free Path Query …...The evaluated methods 1. Annotating the context-free grammar Hellings, Jelle. "Path results for context-free grammar queries on

Context-Free GrammarsExample: the language of even-length palindromes of {a, b}* = { ε, a a, b b, a a a a, a b b a, b a a b, … }

A grammar that accepts this language:

S ⇒ a S aS ⇒ b S bS ⇒ ε

Example derivation of the string a b b a

Page 5: Experimental Study of Context-Free Path Query …...The evaluated methods 1. Annotating the context-free grammar Hellings, Jelle. "Path results for context-free grammar queries on

Context-Free GrammarsExample: the language of even-length palindromes of {a, b}* = { ε, a a, b b, a a a a, a b b a, b a a b, … }

A grammar that accepts this language:

S ⇒ a S aS ⇒ b S bS ⇒ ε

Example derivation of the string a b b a

Page 6: Experimental Study of Context-Free Path Query …...The evaluated methods 1. Annotating the context-free grammar Hellings, Jelle. "Path results for context-free grammar queries on

Context-Free GrammarsExample: the language of even-length palindromes of {a, b}* = { ε, a a, b b, a a a a, a b b a, b a a b, … }

A grammar that accepts this language:

S ⇒ a S aS ⇒ b S bS ⇒ ε

Example derivation of the string a b b a

Page 7: Experimental Study of Context-Free Path Query …...The evaluated methods 1. Annotating the context-free grammar Hellings, Jelle. "Path results for context-free grammar queries on

Context-Free GrammarsExample: the language of even-length palindromes of {a, b}* = { ε, a a, b b, a a a a, a b b a, b a a b, … }

A grammar that accepts this language:

S ⇒ a S aS ⇒ b S bS ⇒ ε

Example derivation of the string a b b a

Page 8: Experimental Study of Context-Free Path Query …...The evaluated methods 1. Annotating the context-free grammar Hellings, Jelle. "Path results for context-free grammar queries on

Context-Free GrammarsExample: the language of even-length palindromes of {a, b}* = { ε, a a, b b, a a a a, a b b a, b a a b, … }

A grammar that accepts this language:

S ⇒ a S aS ⇒ b S bS ⇒ ε

Example derivation of the string a b b a

Page 9: Experimental Study of Context-Free Path Query …...The evaluated methods 1. Annotating the context-free grammar Hellings, Jelle. "Path results for context-free grammar queries on

Context-Free Path Query● A query is a context-free grammar

● Grammar where terminals are edge-labels

● Find paths whose edge labels are accepted by the grammar

Page 10: Experimental Study of Context-Free Path Query …...The evaluated methods 1. Annotating the context-free grammar Hellings, Jelle. "Path results for context-free grammar queries on

Context-Free Path Query● Why?

● Increased expressiveness w.r.t. regular expressions (regular path query)

● Use-cases in ○ biological data analysis○ static code analysis○ …

Page 11: Experimental Study of Context-Free Path Query …...The evaluated methods 1. Annotating the context-free grammar Hellings, Jelle. "Path results for context-free grammar queries on

Our work● We implemented four context-free path query evaluation methods

● Used Neo4j components○ Graph store (vertices and edges)○ PageCache

● Query evaluation is separately implemented on top of these components○ (not integrated into Cypher)

Page 12: Experimental Study of Context-Free Path Query …...The evaluated methods 1. Annotating the context-free grammar Hellings, Jelle. "Path results for context-free grammar queries on

The evaluated methods1. Annotating the context-free grammar

Hellings, Jelle. "Path results for context-free grammar queries on graphs." arXiv preprint arXiv:1502.02242 (2015).

2. Matrix multiplication (GPGPU)Azimov, Rustam, and Semyon Grigorev. "Context-free path querying by matrix multiplication." Proceedings of the 1st ACM SIGMOD Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA). ACM, 2018.

3. Adapted GLR (Tomita) parserSantos, Fred C., Umberto S. Costa, and Martin A. Musicante. "A Bottom-Up Algorithm for Answering Context-Free Path Queries in Graph Databases." International Conference on Web Engineering. Springer, Cham, 2018.

4. Adapted Earley parserSevon, Petteri, and Lauri Eronen. "Subgraph queries by context-free grammars." Journal of Integrative Bioinformatics 5.2 (2008): 157-172.

Page 13: Experimental Study of Context-Free Path Query …...The evaluated methods 1. Annotating the context-free grammar Hellings, Jelle. "Path results for context-free grammar queries on

Grammar in Chomsky Normal Form

S ⇒ A BA ⇒ aB ⇒ b

Annotate the grammar:

A[u,v] ⇔ there exists an A-path from u to v

1. Annotating the grammar

Page 14: Experimental Study of Context-Free Path Query …...The evaluated methods 1. Annotating the context-free grammar Hellings, Jelle. "Path results for context-free grammar queries on

Grammar in Chomsky Normal Form

S ⇒ A BA ⇒ aB ⇒ b

Annotate the grammar:

A[1,4], A[2,1], A[3,4]

B[2,3], B[4,2]

1. Annotating the grammar

Page 15: Experimental Study of Context-Free Path Query …...The evaluated methods 1. Annotating the context-free grammar Hellings, Jelle. "Path results for context-free grammar queries on

Grammar in Chomsky Normal Form

S ⇒ A BA ⇒ aB ⇒ b

Annotate the grammar:

A[1,4], A[2,1], A[3,4]

B[2,3], B[4,2]

S[1,2], S[3,2] ⇒ (1,2) and (3,2) are vertex pairs matching the grammar

1. Annotating the grammar

Page 16: Experimental Study of Context-Free Path Query …...The evaluated methods 1. Annotating the context-free grammar Hellings, Jelle. "Path results for context-free grammar queries on

2. Matrix Multiplication● Relation matrix representation of

the annotated grammar method

● Each grammar non-terminal is stored in the matrix

● The step of combining X ⇒ Y Z is implemented as a “multiplication”

● Can be implemented on GPU

1 2 3 4

1 B A

2 A

3 A

4 B

Page 17: Experimental Study of Context-Free Path Query …...The evaluated methods 1. Annotating the context-free grammar Hellings, Jelle. "Path results for context-free grammar queries on

2. Matrix Multiplication● Relation matrix representation of

the annotated grammar method

● Each grammar non-terminal is stored in the matrix

● The step of combining X ⇒ Y Z is implemented as a “multiplication”

● Can be implemented on GPU

1 2 3 4

1 S B A

2 A

3 S A

4 B

Page 18: Experimental Study of Context-Free Path Query …...The evaluated methods 1. Annotating the context-free grammar Hellings, Jelle. "Path results for context-free grammar queries on

3. Adapted GLR (Tomita) parser● GLR is a generalization of LR parsers● Use context-free grammars to parse input strings

● Whenever the parser has multiple options, the parse state is duplicated and both options are tested separately

● If at least one of these options leads to acceptance, the input is accepted

● Has a data structure that reduces duplicate work

Page 19: Experimental Study of Context-Free Path Query …...The evaluated methods 1. Annotating the context-free grammar Hellings, Jelle. "Path results for context-free grammar queries on

Adaptations for graph parsing instead of string parsing

● A separate parse state is initialized for each vertex● Consumes edges instead of string symbols

● Accepting states in w are backtraced to vertex v where parsing started○ Emits result (v,w)

● The data structure helps keep duplicate work low

● There are some conditions where this algorithm terminates too early○ Failing to produce some results

3. Adapted GLR (Tomita) parser

Page 20: Experimental Study of Context-Free Path Query …...The evaluated methods 1. Annotating the context-free grammar Hellings, Jelle. "Path results for context-free grammar queries on

4. Subgraph Parsing● Similar to the previous method, this is a string parser (Earley parser)

adapted for graph input

● Upon acceptance at vertex v, backtracking is used to find all paths thataccept at v, and are added to a new graph.

● Query result is the induced subgraph of accepted paths!

● Termination problem○ This algorithm depends on a maximum length parameter to stop○ This makes it unsuitable for matching paths of arbitrary length○ Further: There exist conditions where it is missing results or returns no results at all

Page 21: Experimental Study of Context-Free Path Query …...The evaluated methods 1. Annotating the context-free grammar Hellings, Jelle. "Path results for context-free grammar queries on

ResultsGrammar 1: S ⇒ A B C B ⇒ b B C ⇒ c C c-1 D ⇒ d

A ⇒ a a B ⇒ b C ⇒ DA ⇒ a-1 a-1

Page 22: Experimental Study of Context-Free Path Query …...The evaluated methods 1. Annotating the context-free grammar Hellings, Jelle. "Path results for context-free grammar queries on

ResultsGrammar 2: S ⇒ a X a-1 X ⇒ b X b-1 X ⇒ d

X ⇒ c X c-1

Page 23: Experimental Study of Context-Free Path Query …...The evaluated methods 1. Annotating the context-free grammar Hellings, Jelle. "Path results for context-free grammar queries on

ResultsHighly ambiguous grammar:

S ⇒ XX ⇒ X XX ⇒ aX ⇒ b

Tested on a small (a,b)-labeled graph of just 50 vertices

Method Time (s) Memory (MB)

GLR (list) 2,798.6 3.15

GLR (matrix) 372.0 2.36

Ann. Gram (relational) 0.7 0.31

Ann. Gram (arbitrary) 0.7 0.48

Ann. Gram (shortest) 3.7 1.55

Ann. Gram (all-path) 2.8 9.09

Matrix Multiplication 0.1 < 0.01

Page 24: Experimental Study of Context-Free Path Query …...The evaluated methods 1. Annotating the context-free grammar Hellings, Jelle. "Path results for context-free grammar queries on

Conclusions● CFPQ evaluation is not real-time

○ For a graph of 15,000 vertices, run time typically exceeds 1 hour

● Requires large amounts of memory○ Grammar 2 at 5,000 vertices required multiple gigabytes of memory for most methods

● Annotating the grammar seems most promising ○ Robust, can handle ambiguous grammars well○ Many possible query semantics○ Running time: arbitrary path ≈ all-path

Page 25: Experimental Study of Context-Free Path Query …...The evaluated methods 1. Annotating the context-free grammar Hellings, Jelle. "Path results for context-free grammar queries on

Future work● Specialized methods for more restrictive grammars could be much faster

● The annotated grammar and the matrix representation could serve as a path index or reachability index respectively

○ Related to path index work being done at Neo4j