Flow graph parsing and its application in process modeling ... · Flow graph parsing and its...

© 2010 IBM Corporation

Flow graph parsing and its application in (business) process management and SOA

IBM Research – Zurich

Process Management Technologies

Hagen Völzer Joint work with Jussi Vanhatalo and Artem Polyvyanyy




2 Hagen Völzer




3 Hagen Völzer

Flow Graph Parsing

Unique source, unique sink

Every node is on a path from

the source to the sink

D

B

C

A

s a2

a1 p1 p2

x3 x2

a4

x1 x4 e a3

Flow graph

a3

C

D

A

B

a2

a1 a4

Parse tree

Decomposition into hierarchy

of Single-Entry-Single-Exit

(SESE)-fragments

A fragment has the same

properties as a flow graph




4 Hagen Völzer

Outline

What? – Flow graph parsing

Why? – Use cases for flow graph parsing

How? – Requirements

– Approach

– Algorithms




5 Hagen Völzer 5 Zurich Research Laboratory

A Business Process Model




6 Hagen Völzer

Workflow graphs

Nodes are either gateways (x1, p1, …) or tasks (a1, a2, …)

Control state is represented by tokens on the edges

s a2

a1 p1 p2

x3 x2

a4

x1 x4 e a3

Decision

Fork Join (synchronization)

Merge (no synchronization)




7 Hagen Völzer

Use Case 1: BPMN to BPEL Translation

<process ...> ... <if> <condition>...</condition> <flow> <invoke name="a1" ... /> <sequence> <invoke name="a2" ... /> <invoke name="a3" ... /> </sequence> </flow> <else> <repeatUntil> <invoke name="a4" ... /> <condition>...</condition> </repeatUntil> </else> </if> </process>

Sequence

If

a2

Repeat-Until a4

a3

a1 Flow

Repeat-Until

Flow

Sequence

If

s a2

a1 p1 p2

x3 x2

a4

x1 x4 e a3

Business Process Modeling

Notation (BPMN)

Business Process Execution Language (BPEL)

a3

Sequence

Repeat-Until

If

Flow

a2

a1 a4

Parse tree




8 Hagen Völzer

Control flow errors

s a2

a1 p1 p2

x3 x2

a4

x1 x4 e a3

Decision

Fork Join (synchronization)

Merge (no synchronization)

a2

a1

a2

a1

a4 a4

Deadlock (A token is stuck) Lack of synchronization (Two tokens or more)




9 Hagen Völzer

Use Case 2: Detection of Control-Flow Errors

Divide-and-conquer technique – Speeds up a base technique

• E.g. a state-space exploration

1. Decomposes a workflow graph into its fragments

2. Detects errors for each fragment in isolation (Absence of errors here is compositional w.r.t. fragments.)

May reduce the number of states significantly

Fast heuristics are known for many fragments occurring in practice

s e

a5 a6

a2

a1 p1 p2

a3

x1

x4

a4

x2 x3

B A

C

D G

o2 o1

a2

a1 p1 p2 D

e s

A

e s B

C

a3

x1

x4

a4

x2 x3

B D

s

e

a5 a6 C s e

GA GD

GB

GC

o1 o2




10 Hagen Völzer

Use Case 3: Subprocess Detection

s

a17

a6

a7

a8

a11

a12

a13

a14

a10

a15 e

a18

a16

a4

a3

a2

a1

a5

a9

P2

P3




11 Hagen Völzer

Subprocess Detection

s

a17 a10

a15 e

a18

a16

a1

a9

P2

P3

a6

a7

a8

a4

a3

a2

a5

P2

s e

a11

a12

a13

a14

P3

e s




12 Hagen Völzer

Use Case 4: Compare & Merge

Compound changes rather than elementary changes – less changes, abstraction of edge changes, categories of changes

Visualization according to the fragment structure of the process model – easier to handle for the business user

Resolution of changes – Easy to integrate with control-flow analysis

– Assure that a change does not introduce errors

Parsing




13 Hagen Völzer

Other Use Cases for Parsing

Process Model Refactoring

Process Similarity Search

Layout, printing large graphs

Understanding large process models

Data-flow analysis

Runtime Optimization

Other models: Control flow graphs, Petri nets, Circuits




14 Hagen Völzer

Outline



How? – Requirements for parsing

– Approach

– Algorithms




15 Hagen Völzer

Requirement 1: Uniqueness

Naïve parsing approach: – Find a fragment and parse the fragment until there is no more unparsed fragment – Finding is non-deterministic, parse tree is not unique

The parse tree should be unique – The same BPMN model is always translated to the same BPEL model – Important for structurally comparing two models

s e x4 x3 x2 x1

a4 a3 a2

a1 s e x4 x3 x2 x1

a4 a3 a2

a1 If Repeat-Until Sequence 2 Sequence 1

Sequence 3 If Repeat-Until

Sequence 1

a2

If

Repeat-Until

Sequence 1

Sequence 2

a1

a3 a4 Repeat-Until

Sequence 1

Sequence 3

a4

a3 a2 a1

If




16 Hagen Völzer

Requirement 2: Modularity

Motivation: A local change in BPMN translates into a local change in BPEL – Again important for comparing two models

Modular: – Replacing a fragment with another fragment changes only the respective subtree in the

parse tree

Naïve parsing is not modular

s e x4 x3 x2 x1

a4 a3 a2


s e x2 x1

a5 a3 a2

a1 Seq. 3 If Sequence 1

Sequence 1

Sequence 3

a5 a3 a2 a1

If

a2

If

Repeat-Until

Sequence 1

Sequence 2

a1

a3 a4




17 Hagen Völzer

Canonical Fragments

Parse tree is a hierarchy of fragments in which any two fragments do not overlap Some fragments must be excluded from a parse tree

Exclude fragments that overlap with some other fragment – Such a fragment is called non-canonical – The non-maximal sequences are the non-canonical fragments here

s e x4 x3 x2 x1

a4 a3 a2


a2

If Repeat-Until

Sequence 1

a1

a3

a4

Sequence 3

a2

If

Repeat-Until

Sequence 1

Sequence 2

a1

a3 a4

Repeat-Until

Sequence 1

Sequence 3

a4

a3 a2 a1

If




18 Hagen Völzer

The Normal Process Structure Tree (NPST)

The Tree of canonical fragments

The NPST is unique and modular

Always search for sequences last, do not consider non-maximal sequences

Extends work by Johnson et al. (1994) – Can be computed in linear time (using DFS techniques)

s e x4 x3 x2 x1

a4 a3 a2

a1 If Repeat-Until

Sequence 1

s e x2 x1

a5 a3 a2

a1 If Sequence 1

Sequence 1

a5 a3

a2 a1

If

a2

If Repeat-Until

Sequence 1

a1

a3

a4 a1

a3




19 Hagen Völzer

Requirement 3: Granularity (Finer decomposition)

Motivation: Translate more BPMN models into BPEL – Also helpful in control-flow analysis and other use cases

Our contribution is the Refined Process Structure Tree – Extends work by Tarjan and Valdes (1980)

– More fine-grained than any previous technique

– Proved uniqueness and modularity

– Can be computed in linear time

x1 x2 s e x3

a2

a1

a3

X

x1 x2 s e x3

a2

a1

a3

If Repeat-Until

Sequence




20 Hagen Völzer

Outline



How? – Our contribution: The Refined Process Structure Tree

– Approach to computation: Using the triconnected components

– An even more elegant algorithm

– Lifting some assumptions • Dealing with separation points

• Dealing with multiple sources/sinks




21 Hagen Völzer

Relaxed Notion of a Fragment

The commonly used notion:

A fragment is a connected

subgraph that has – exactly one entry edge, and

– exactly one exit edge.

Relaxed notion:

A fragment is a connected

subgraph that has – exactly one entry node, and

– exactly one exit node.

u v

a2

a1

F

u v

a2

a1

F




22 Hagen Völzer

More Precisely [Tarjan and Valdes, 1980]:

A fragment F is a connected subgraph that has – exactly two boundary nodes, – one entry, and one exit

A boundary node is an entry of F if – all incoming edges are outside F, or – all outgoing edges are inside F

A boundary node is an exit of F if – all incoming edges are inside F, or – all outgoing edges are outside F

(Entry cannot be used to get out, exit cannot be used to get in)

Each edge is a (trivial) fragment

These boundary nodes are

neither entries nor exits

Not a fragment!

entry exit

fragment

u v F

u v

a2

a1

entry exit

fragment

u v F

a1

entry exit

fragment

F

a3

u v

a4

F




23 Hagen Völzer

Non-Canonical and Canonical Fragments

Non-canonical fragments overlap with some fragment

Canonical fragments do not overlap and thus they form a hierarchy

a

d

b u v c

N2

N1

u v

a

c

b

N2

N1

non-maximal sequences

t s u v b c a

N1 N2

maximal sequence

t s u v b c a

F1

P1

F2 F3 u v

a

c

b

B1

F1

F2

F3

non-canonical bond fragments

canonical bond fragments

a

d

b u v

c R1

B1




24 Hagen Völzer

The Refined Process Structure Tree (RPST)

As the canonical fragments do not overlap, they form a hierarchy.

The Refined Process Structure Tree is the tree of canonical fragments

G P1 T1 B1

S1

s

v2

v1

v3 v4 v7 v5 v6 t

a

b

c

d j

n

e f

g

h

i

k l

m

o

P1

T1 B1

S1

j k l m

c d f g e a b h i n

o




25 Hagen Völzer

Granularity

The RPST is: – More fine-grained than

• the NPST

• the parse tree by Tarjan and

Valdes

x1 x2 s e

a2

a1 If

Repeat-Until

X

x1 x2 s e

a2

a1

Fragments in the NPST

Fragments in the RPST

x1 x2 s e x3

a2

a1

a3

If Repeat-Until

Sequence




26 Hagen Völzer

Outline











27 Hagen Völzer

G P1 T1

T2

B2

B1

P2 s

v2

v1

v3 v4 v7 v5 v6 t

a

b

c

d j

n

e f

g

h

i

k l

m

o

G P1 T1 B1

S1 s

v2

v1

v3 v4 v7 v5 v6 t

a

b

c

d j

n

e f

g

h

i

k l

m

o s

v2

v1

v3 v4 v7 v5 v6 t

a

b

c

d j

n

e f

g

h

i

k l

m

o

P1 T1

T2

B2

B1

P2

G

s

v2

v1

v3 v4 v7 v5 v6 t

a

b

c

d j

n

e f

g

h

i

k l

m

o

G

Step 1: Detect

the triconnected

components.

Step 2: Analyze

the triconnected

components.

Step 3:

Restructure the tree

into the RPST.

A Linear Time Algorithm for Computing the RPST (Overview)

j k

P2 T2

T1 B1

B2

P1

a b h i m n

l

o

c d f g e

u w

y

z

x

P1

T1 B1

S1

j k l m

c d f g e a b h i n

o




28 Hagen Völzer

Connectivity

Undirected graph has a unique

decomposition into connected

components




29 Hagen Völzer

Biconnectivity

A separation point splits the graph into

two or more components

Splitting at all separation points yields

a unique decomposition into

biconnected components

Separation point




30 Hagen Völzer

Triconnectivity

A separation pair splits the graph into two or more subgraphs (called split graphs)

Entering/ exiting a split graph is possible only via the nodes of the separation pair

Separation pair




31 Hagen Völzer

Fragments and Triconnectivity

Add edge between sink and source (“return edge”) and ignore directions – Resulting graph is w.l.o.g. biconnected

– Each entry-exit pair (e.g. (u,v)) of a non-trivial fragment is a separation pair

– If (u,v) is a separation pair, then the subgraph in between u and v is a fragment if u is an

entry and v is an exit of that subgraph

Idea: Find fragments through separation pairs and their split graphs




32 Hagen Völzer

Triconnected Decomposition

Always split into two split graphs

Add a fresh edge e between the separation

pair to each split graph (called virtual edge)

Recursive splitting of split graphs yields split

components

Split components are not necessarily unique




33 Hagen Völzer

Non-determinism in splitting

Decomposition along separation pairs is unique if polygons and bonds are not decomposed further defines unique “triconnected components” (TCC)

Triconnected components can be computed in linear time (Hopcroft and Tarjan 1973)

Polygon Bond




34 Hagen Völzer

Step 1: Compute the Triconnected Components

s

v2

v1

v3 v4 v7 v5 v6 t

a

b

c

d j

n

e f

g

h

i

k l

m

o

A two-terminal graph

G

The undirected version

U(G) r

s

v2

v1

v3 v4 v7 v5 v6 t

a

b

c

d j

n

e f

g

h

i

k l

m

o

s v5 v7 t u w

r

o

s

v2

v1

v5

a

b

h

i

u

x

v2

v1

v3 v4

c

d

e f

g x

v5 v7

n

m

w

y

v5 v6 j

k

z

v5 v6 v7 l

y

z

The triconnected components

P1

T1

P2 T2

B2

B1

s

v2

v1

v3 v4 v7 v5 v6 t

a

b

c

d j

n

e f

g

h

i

k l

m

o

Component subgraphs

P1 T1

T2

B2

B1

P2

G

j k The tree of the triconnected components

P2 T2

T1 B1

B2

P1

a b h i m n

l

o

c d f g e

u w

y

z

x




35 Hagen Völzer

Step 2: Analyze the Triconnected Components

s

v2

v1

v3 v4 v7 v5 v6 t

a

b

c

d j

n

e f

g

h

i

k l

m

o

P1 T1

T2

B2

B1

P2

G

G P1 T1

T2

B2

B1

P2 s

v2

v1

v3 v4 v7 v5 v6 t

a

b

c

d j

n

e f

g

h

i

k l

m

o

j k

P2 T2

T1 B1

B2

P1

a b h i m n

l

o

c d f g e

u w

y

z

x

j k

P2 T2

T1 B1

B2

P1

a b h i m n

l

o

c d f g e

u w

y

z

x

Step 2: Analyze the triconnected components.




36 Hagen Völzer

Step 3: Restructure the Tree into the RPST G P1

T1

T2

B2

B1

P2 s

v2

v1

v3 v4 v7 v5 v6 t

a

b

c

d j

n

e f

g

h

i

k l

m

o

j k

P2 T2

T1 B1

B2

P1

a b h i m n

l

o

c d f g e

u w

y

z

x

Step 3: Restructure the tree into the RPST.

G P1 T1 B1

S1

s

v2

v1

v3 v4 v7 v5 v6 t

a

b

c

d j

n

e f

g

h

i

k l

m

o

P1

T1 B1

S1

j k l m

c d f g e a b h i n

o

Step 3 can be done in linear time




37 Hagen Völzer

Examples of The Restructuring Step

P1 v3

v4 v1

v2

s t

B P

s t

v3

v4 v1

v2

v0

P

B1

v3 v1 v2 v4 b

f

c d

h

e t s

a g

T

P

R

S

D

P1

P2

B1

B2

R1

S

R2 M

P1 B1

P2 B2

B B P

a b

v3

v4 v1

v2

s t

v3 v1 v2 v4 b

f

c d

h

e t s

a g

T

s t

v3

v4 v1

v2

v0

P

a b

B3




38 Hagen Völzer

Outline











39 Hagen Völzer

A more elegant computation (1/3)

Graph is normalized if each node has

either at most one incoming or at most

one one outgoing edge

Theorem: For a normalized graph,

RPST and Tree of TCC coincide

a g

c d e fb

P1

T1

1. Compute the

Tree of the TCCs

s u

v

w

xa

b

c

d

e

f

gt

P1T1




40 Hagen Völzer

A more elegant computation (2/3) -- Idea

Normalize graph by splitting nodes

Compute Tree of TCC = RPST of normalized graph

Project RPST back onto original graph




41 Hagen Völzer

A more elegant computation (3/3)

1. Compute

the Tree of the

TCCs 2. Analyze the

TCCs

3. Restructure the

tree to the RPST

g k

j

h i

B2

P1

B1

2. Compute

the Tree of the

TCCs

g k

j

l m

P2

P1

B1

h i

B2

1. Split nodes

g k

jP2

P1

B1

h i

B2 3. Remove

added edges

4. Remove

redundant

fragments

sg

h

j

ki

B2P1

l

P2 B1

tm

*y *z *z*yz tsg

h

j

ky

i

B1 P1

B2




42 Hagen Völzer

Outline











43 Hagen Völzer

Dealing with Separation Points

Use the same algorithm: – Normalize graph separation points disappear

– Compute RPST

– Project RPST onto original graph

Obtain fragments where entry and exit is the same node (e.g. P3)




44 Hagen Völzer

Dealing with Separation Points

Our decomposition creates natural hierarchy of fragments (a)-(c)

Alternative: – Compute biconnected components

– Decompose each biconnected component separately

– Yields different, less natural decomposition, e.g. (d) vs. (c)




45 Hagen Völzer

Dealing with Multiple Sinks

s

e1

a3

a2

a1 v

e2

e3

u

w

G0

s

e1

a3

a2

a1 v

u

w

e2

e3

e x

?

A B G1

G3

s

e1

a3

a2

a1 v

u

w

e2

e3

e x2 x1

A* B

Introduce dummy end node, then compute RPST

Allows us to complete and refactor workflow graphs

s

e1

a3

a2

a1 v

u

w

e2

e3

e

G2 A* B x1

? x2

?




46 Hagen Völzer

Conclusion

Flow Graph Parsing has – Many interesting use cases

– Requirements: Uniqueness, modularity, granularity

A new technique: Refined Process Structure Tree – Has simple characterization in terms of canonical fragments

– Improves existing techniques by providing a more fine-grained decomposition

– Unique, modular, can be computed in linear time

Can be applied to other directed graphs (Petri nets, circuits)

Implemented in IBM WebSphere Software




47 Hagen Völzer

Backup Slides




48 Hagen Völzer

Case Study: Soundness Checking

State-space exploration: – Cannot analyze a process model

having 100000 states State space explosion

• Considered intractable

Divide-and-conquer technique: – Can check soundness of more

process models

– Can decrease the analysis time

significantly if the state-spaces are

large • E.g., by factor of 115

– Can increase the analysis time if the

state-spaces are small • E.g., by factor of 2.5

State-space

exploration

Divide-and-

conquer

Library 1: 363 process models

Intractable 6 0

Visited states 680559 2647

Total time 220 s 1.9 s

Library 1: Excluding intractable


Total time 11.3 s 1.9 s

Library 2: 282 process models


Total time 0.6 s 1.5 s

Intractable 0 0




49 Hagen Völzer

Step 1: Detect

the triconnected

components.

Step 2: Check whether

each triconnected

component is a fragment.

Step 3: Restructure

the tree

into the RPST.

A Linear Time Algorithm for Computing the RPST

P1 T1

T2

B2

B1

P2

P1 T1 B1

S1

P1 T1

T2

B2

B1

P2

G

s

v2

v1

v3 v4 v7 v5 v6 t

a

b

c

d j

n

e f

g

h

i

k l

m

o

G

s

v2

v1

v3 v4 v7 v5 v6 t

a

b

c

d j

n

e f

g

h

i

k l

m

o

G

s

v2

v1

v3 v4 v7 v5 v6 t

a

b

c

d j

n

e f

g

h

i

k l

m

o

G

s

v2

v1

v3 v4 v7 v5 v6 t

a

b

c

d j

n

e f

g

h

i

k l

m

o




50 Hagen Völzer

References

[HT73] J. Hopcroft and R. E. Tarjan. Dividing a graph into triconnected components. SIAM J. Comput., 2:135–158, 1973.

[Val78] Jacobo Valdes Ayesta. Parsing flowcharts and series-parallel graphs. PhD thesis, Stanford, CA, USA, 1978.

[TV80] Robert E. Tarjan and Jacobo Valdes. Prime subprogram parsing of a program. In POPL ’80: Proceedings of the 7th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 95–105, New York, NY, USA, 1980. ACM.

[JJP94] Richard Johnson, David Pearson, and Keshav Pingali. The program structure tree: Computing control regions in linear time. In Proceedings of the ACM SIGPLAN’94 Conference on Programming Language Design and Implementation (PLDI), pages 171–185, 1994.

[Joh95] Richard Craig Johnson. Ecient program analysis using dependence flow graphs. PhD thesis, Ithaca, NY, USA, 1995.

[GM00] Carsten Gutwenger and Petra Mutzel. A linear time implementation of SPQR-trees. In Joe Marks, editor, Graph Drawing, volume 1984 of Lecture Notes in Computer Science, pages 77–90. Springer, 2000.

[VVL07] Jussi Vanhatalo, Hagen Völzer, and Frank Leymann. Faster and more focused control-flow analysis for business process models though SESE decomposition. In 5th International Conference on Service-Oriented Computing (ICSOC), volume 4749 of Lecture Notes in Computer Science, pages 43–55. Springer-Verlag Berlin Heidelberg, September 2007.




51 Hagen Völzer

Reduction increases as the graph size increases

Upper boundary for the largest fragment size seems to be independent

of graph size

0

10

20

30

40

50

60

0 50 100 150 200 250

Number of edges in the workflow graph

Num

ber

of

edges

in t

he

larg

est

frag

men

t of

the

work

flow

gra

ph

reduction factor = 1

reduction

factor = 9




52 Hagen Völzer

Triconnectivity

A separation pair splits the graph into

two or more subgraphs (called split

graphs)

A fresh edge e between the

separation pair is added to each split

graph (called virtual edge)

Recursive splitting of split graphs

yields split components




53 Hagen Völzer

Example

Flow graph parsing and its application in process modeling ... · Flow graph parsing and its...

Documents

Transcript of Flow graph parsing and its application in process modeling ... · Flow graph parsing and its...