Process Mining: Discovering processes from event logs All truths are easy to understand once they...

Post on 29-Dec-2015

218 views 0 download

Tags:

Transcript of Process Mining: Discovering processes from event logs All truths are easy to understand once they...

Process Mining:Discovering processes from event

logs

All truths are easy to understand once they are discovered; the point is to discover them. Galileo Galilei (1564 - 1642)

Prof.dr.ir. Wil van der AalstEindhoven University of Technology

Department of Information and TechnologyP.O. Box 513, 5600 MB Eindhoven

The Netherlandsw.m.p.v.d.aalst@tm.tue.nl

Outline • Process Mining

– overview– alpha algorithm– genetic mining

• ProM– Architecture– Convertors (e-mail, Staffware, InConcert, SAP, etc.) – Process mining plug-ins

• Alpha-algorithm• Multi-phase mining• Genetic mining

– Analysis plug-ins– Conformance testing plug-in– LTL checker plug-in– Social network plug-in

• Conclusion

Process Mining

processdesign

implementation/configuration

processenactment

diagnosis

Motivation: Reversing the process

• Process mining can be used for:– Process discovery (What is the process?)

– Delta analysis (Are we doing what was specified?)

– Performance analysis (How can we improve?)

process mining

Registerorder

Prepareshipment

Shipgoods

Receivepayment

(Re)sendbill

Contactcustomer

Archiveorder

Overview

1) basic performance metrics

2) process modelStart

Register order

Prepareshipment

Ship goods

(Re)send bill

Receive paymentContact

customer

Archive order

End

3) organizational model 4) social network

5) performance characteristics

If …then …

6) auditing/security

www.processmining.org

Let us focus on mining process models …

1) basic performance metrics

2) process modelStart

Register order

Prepareshipment

Ship goods

(Re)send bill

Receive paymentContact

customer

Archive order

End

3) organizational model 4) social network

5) performance characteristics

If …then …

6) auditing/security

... and a very simple approach: The alpha algorithm

Alpha algorithm

α

Process log• Minimal information in

log: case id’s and task id’s.

• Additional information: event type, time, resources, and data.

• In this log there are three possible sequences:– ABCD– ACBD– EF

case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D

>,,||,# relations

• Direct succession: x>y iff for some case x is directly followed by y.

• Causality: xy iff x>y and not y>x.

• Parallel: x||y iff x>y and y>x

• Choice: x#y iff not x>y and not y>x.

case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D

A>BA>CB>CB>DC>BC>DE>F

AB

AC

BD

CD

EF

B||CC||B

Basic idea (1)

x y

xy

Basic idea (2)

xy, xz, and y||z

x

z

y

Basic idea (3)

xy, xz, and y#z

x

z

y

Basic idea (4)

xz, yz, and x||y

x

y

z

Basic idea (5)

xz, yz, and x#y

x

y

z

It is not that simple: Basic alpha algorithm

Let W be a workflow log over T. (W) is defined as follows.

1. TW = { t T     W t },

2. TI = { t T     W t = first() },

3. TO = { t T     W t = last() },

4. XW = { (A,B)   A TW   B TW    a Ab B a W b     a1,a2 A a1#W a2

   b1,b2 B b1#W b2 },

5. YW = { (A,B) X    (A,B) XA A B B (A,B) = (A,B) },

6. PW = { p(A,B)    (A,B) YW } {iW,oW},

7. FW = { (a,p(A,B))    (A,B) YW   a A }   { (p(A,B),b)    (A,B) YW   b B

}  { (iW,t)    t TI}  { (t,oW)   t TO}, and

8. (W) = (PW,TW,FW). The alpha algorithm has been proven to be correct for a large class of free-choice nets.

Examplecase 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D

A

B

C

D

E F

(W)

W

DEMOAlpha algorithm

A

E

G

invitereviewers

D

get review 2

time-out 2

collectreviews

H

decide

I

accept

J

reject

inviteadditionalreviewer

K

M

L

get review X

time-out X

C

B

get review 1

time-out 1

G

F

get review 3

time-out 3

48 cases16 performers

Challenges• Refining existing algorithm for (control-flow/process

perspective)– Hidden tasks– Duplicate tasks– Non-free-choice constructs– Loops– Detecting concurrency (implicit or explicit)– Mining and exploiting time– Dealing with noise– Dealing with incompleteness

• Mining other perspectives (data, resources, roles, …) • Gathering data from heterogeneous sources• Visualization of results• Delta analysis

Genetic mining

Approach

Genetic mining: The two main questions

• How to represent an individual? (Petri net?)• How to define the genetic operators? (e.g.,

crossover)

How to represent an individual?

• Problems with Petri nets:– Places do not exist in log– difficulties defining mutation and crossover– problems describing subtle rules without adding transitions

A

B

C

D

Representation of the goal processtrue A A A D D E^F BvCvG

→ A B C D E F G H

A 0 1 1 1 0 0 0 0 BvCvD

B 0 0 0 0 0 0 0 1 H

C 0 0 0 0 0 0 0 1 H

D 0 0 0 0 1 1 0 0 E^F

E 0 0 0 0 0 0 1 0 G

F 0 0 0 0 0 0 1 0 G

G 0 0 0 0 0 0 0 1 H

H 0 0 0 0 0 0 0 0 true

A

B

D

E

C

F

G

H

A more compact representation

ACTIVITY INPUT OUTPUT

A {} {{B,C,D}}

B {{A}} {{H}}

C {{A}} {{H}}

D {{A}} {{E},{F}}

E {{D}} {{G}}

F {{D}} {{G}}

G {{E},{F}} {{H}}

H {{B,C,G}} {}

A

B

D

E

C

F

G

H

Any Petri net can be mapped onto a causal matrix:

ACTIVITY INPUT OUTPUT

A {...} {{C,D},...}

B {...} {{C,D},...}

C {{A,B},...} {...}

D {{A,B},...} {...}

A

B D

C

but ...

Mapping a causal matrix onto a Petri net?

ACTIVITY INPUT OUTPUT

A {{i11,i12,i13},{i21,i22,i23}} {{o11,o12,o13},{o21,o22,o23}}

A

i11i12i13

i21i22i23

o11o12o13

o21o22o23

Wiring based on input and output sets

A

i11i12i13

i21i22i23

B

o11o12o13

o21o22o23

A

B

?

Using place fusion or silent transitions.

Example: Event logcase id activity id originator timestamp

case 1 activity A John 9-3-2004:15.01

case 2 activity A John 9-3-2004:15.12

case 3 activity A Sue 9-3-2004:16.03

case 3 activity D Carol 9-3-2004:16.07

case 1 activity B Mike 9-3-2004:18.25

case 1 activity H John 10-3-2004:9.23

case 2 activity C Mike 10-3-2004:10.34

case 4 activity A Sue 10-3-2004:10.35

case 2 activity H John 10-3-2004:12.34

case 3 activity E Pete 10-3-2004:12.50

case 3 activity F Carol 11-3-2004:10.12

case 4 activity D Pete 11-3-2004:10.14

case 3 activity G Sue 11-3-2004:10.44

case 3 activity H Pete 11-3-2004:11.03

case 4 activity F Sue 11-3-2004:11.18

case 4 activity E Clare 11-3-2004:12.22

case 4 activity G Mike 11-3-2004:14.34

case 4 activity H Clare 11-3-2004:14.38

A

B

D

E

C

F

G

H

Goal

Example: Starting pointcase id activity id

case 1 activity A

case 2 activity A

case 3 activity A

case 3 activity D

case 1 activity B

case 1 activity H

case 2 activity C

case 4 activity A

case 2 activity H

case 3 activity E

case 3 activity F

case 4 activity D

case 3 activity G

case 3 activity H

case 4 activity F

case 4 activity E

case 4 activity G

case 4 activity H

+ 500 randomly generated initial individuals

Two individuals

ACTIVITY INPUT OUTPUT

A {} {{B,C,D}}

B {{A}} {{H}}

C {{A}} {{H}}

D {{A}} {{E}}

E {{D}} {{G}}

F {} {{G}}

G {{E},{F}} {{H}}

H {{C,B,G}} {}

ACTIVITY INPUT OUTPUT

A {} {{B,C,D}}

B {{A}} {{H}}

C {{A}} {{H}}

D {{A}} {{E,F}}

E {{D}} {{G}}

F {{D}} {{G}}

G {{E},{F}} {{H}}

H {{C},{B},{G}} {}

Crossover

ACTIVITY INPUT OUTPUT

A {} {{B,C,D}}

B {{A}} {{H}}

C {{A}} {{H}}

D {{A}} {{E, F}}

E {{D}} {{G}}

F {{D}} {{G}}

G {{E},{F}} {{H}}

H {{C,B,G}} {}

ACTIVITY INPUT OUTPUT

A {} {{B,C,D}}

B {{A}} {{H}}

C {{A}} {{H}}

D {{A}} {{E}}

E {{D}} {{G}}

F {} {{G}}

G {{E},{F}} {{H}}

H {{C},{B},{G}} {}

Resulting CM with fitness 1.0

true A A A D D E^F BvCvG

→ A B C D E F G H

A 0 1 1 1 0 0 0 0 BvCvD

B 0 0 0 0 0 0 0 1 H

C 0 0 0 0 0 0 0 1 H

D 0 0 0 0 1 1 0 0 E^F

E 0 0 0 0 0 0 1 0 G

F 0 0 0 0 0 0 1 0 G

G 0 0 0 0 0 0 0 1 H

H 0 0 0 0 0 0 0 0 true

Mapping

A

B

D

E

C

F

G

H

true00000000H

H10000000G

G01000000F

G01000000E

E^F00110000D

H10000000C

H10000000B

BvCvD00001110A

H G F E D C B A→

BvCvGE^F D D A A A true

true00000000H

H10000000G

G01000000F

G01000000E

E^F00110000D

H10000000C

H10000000B

BvCvD00001110A

H G F E D C B A→

BvCvGE^F D D A A A true

ProM framework

ProMStaffware

InConcert

MQ Series

workflow management systems

FLOWer

Vectus

Siebel

case handling / CRM systems

SAP R/3

BaaN

Peoplesoft

ERP systems

common XML format for storing/exchanging workflow logs

input/outputCore

Plugins

ProMframework

visualization analysis

alpha algorithmgenetic

algorithmTsinghua alpha

algorithmMulti phasealgorithms

social networkminer

case dataextraction

property verifier

ExternalTools

NetMiner Viscovery ......

...

Converter plug-in: EMailAnalyzer

XML format

ProM architecture

UserInterface

+User

Interaction

StaffwareFlowerSAPInConcert...

Heuristic NetAris Graph Format(Aris AML Format)PNMLTPN...

MiningPlugin

ImportPlugin

ExportPlugin

AnalysisPlugin

ConversionPlugin

Heuristic Net PNMLAris Graph format TPNNetMiner file Agna fileAris PPM Instances DOTComma Seperated Values …...

Log Filter

VisualisationEngine

XML Log

ResultFrame

Mining plug-in: Alpha algorithm

Mining plug-in: Genetic Miner

Mining plug-in: Multi-phase mining

Step 1: Get instances

A BD

CE A B

D

CF

A BD

CG

H

IB

D

CE

Step 2: Project

A BD

CG

1

H

I

E

1 2

2

2

1

11

1

11

11

1

1

1

1

ts 11

tf 11

A BD

CE1

1 1

1

1

1tf 1

1ts 11

A BD

CF1

1 1

1

1

1tf 1

1ts 11

Step 3: Aggregate

A BD

CG

3

H

I

E

3 4

4

4

2

11

1

22

11

1

1

1

1

ts 33

tf 32

F1

1

1

Step 4: Map onto EPCets

ts

B

C D

H I

G

etf

E F

eB

eH eI

eC eD

eG

eE eF

A

eA

Step 5: Map onto Petri net (or other language)

A B

C

D

E

F

G

H

I

ts

Mining plug-in: Social network miner

Cliques

SN based on hand-over of work metric

density of network is 0.225

SN based on working together (and ego network)

Analysis plug-in: LTL checker

Analysis plug-in: Conformance checker

Do they agree?

Fitness is not enough

Screenshot

(Also runs on Mac.)

Other analysis plug-ins

More demos?

Conclusion

• Process mining provides many interesting challenges for scientists, customers, users, managers, consultants, and tool developers.

• Involves multiple perspectives (process, data, resources, etc.)

• Get ProM-ed!• You can contribute by applying ProM and developing

plug-ins

processdesign

implementation/configuration

processenactment

diagnosis

More information

http://www.workflowcourse.com

http://www.workflowpatterns.com

http://www.processmining.org