Process Mining: Discovering processes from event logs All truths are easy to understand once they...

62
Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo Galilei (1564 - 1642) Prof.dr.ir. Wil van der Aalst Eindhoven University of Technology Department of Information and Technology P.O. Box 513, 5600 MB Eindhoven The Netherlands [email protected]

Transcript of Process Mining: Discovering processes from event logs All truths are easy to understand once they...

Page 1: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Process Mining:Discovering processes from event

logs

All truths are easy to understand once they are discovered; the point is to discover them. Galileo Galilei (1564 - 1642)

Prof.dr.ir. Wil van der AalstEindhoven University of Technology

Department of Information and TechnologyP.O. Box 513, 5600 MB Eindhoven

The [email protected]

Page 2: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Outline • Process Mining

– overview– alpha algorithm– genetic mining

• ProM– Architecture– Convertors (e-mail, Staffware, InConcert, SAP, etc.) – Process mining plug-ins

• Alpha-algorithm• Multi-phase mining• Genetic mining

– Analysis plug-ins– Conformance testing plug-in– LTL checker plug-in– Social network plug-in

• Conclusion

Page 3: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Process Mining

processdesign

implementation/configuration

processenactment

diagnosis

Page 4: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Motivation: Reversing the process

• Process mining can be used for:– Process discovery (What is the process?)

– Delta analysis (Are we doing what was specified?)

– Performance analysis (How can we improve?)

process mining

Registerorder

Prepareshipment

Shipgoods

Receivepayment

(Re)sendbill

Contactcustomer

Archiveorder

Page 5: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Overview

1) basic performance metrics

2) process modelStart

Register order

Prepareshipment

Ship goods

(Re)send bill

Receive paymentContact

customer

Archive order

End

3) organizational model 4) social network

5) performance characteristics

If …then …

6) auditing/security

www.processmining.org

Page 6: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Let us focus on mining process models …

1) basic performance metrics

2) process modelStart

Register order

Prepareshipment

Ship goods

(Re)send bill

Receive paymentContact

customer

Archive order

End

3) organizational model 4) social network

5) performance characteristics

If …then …

6) auditing/security

... and a very simple approach: The alpha algorithm

Page 7: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Alpha algorithm

α

Page 8: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Process log• Minimal information in

log: case id’s and task id’s.

• Additional information: event type, time, resources, and data.

• In this log there are three possible sequences:– ABCD– ACBD– EF

case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D

Page 9: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

>,,||,# relations

• Direct succession: x>y iff for some case x is directly followed by y.

• Causality: xy iff x>y and not y>x.

• Parallel: x||y iff x>y and y>x

• Choice: x#y iff not x>y and not y>x.

case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D

A>BA>CB>CB>DC>BC>DE>F

AB

AC

BD

CD

EF

B||CC||B

Page 10: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Basic idea (1)

x y

xy

Page 11: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Basic idea (2)

xy, xz, and y||z

x

z

y

Page 12: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Basic idea (3)

xy, xz, and y#z

x

z

y

Page 13: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Basic idea (4)

xz, yz, and x||y

x

y

z

Page 14: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Basic idea (5)

xz, yz, and x#y

x

y

z

Page 15: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

It is not that simple: Basic alpha algorithm

Let W be a workflow log over T. (W) is defined as follows.

1. TW = { t T     W t },

2. TI = { t T     W t = first() },

3. TO = { t T     W t = last() },

4. XW = { (A,B)   A TW   B TW    a Ab B a W b     a1,a2 A a1#W a2

   b1,b2 B b1#W b2 },

5. YW = { (A,B) X    (A,B) XA A B B (A,B) = (A,B) },

6. PW = { p(A,B)    (A,B) YW } {iW,oW},

7. FW = { (a,p(A,B))    (A,B) YW   a A }   { (p(A,B),b)    (A,B) YW   b B

}  { (iW,t)    t TI}  { (t,oW)   t TO}, and

8. (W) = (PW,TW,FW). The alpha algorithm has been proven to be correct for a large class of free-choice nets.

Page 16: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Examplecase 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D

A

B

C

D

E F

(W)

W

Page 17: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

DEMOAlpha algorithm

A

E

G

invitereviewers

D

get review 2

time-out 2

collectreviews

H

decide

I

accept

J

reject

inviteadditionalreviewer

K

M

L

get review X

time-out X

C

B

get review 1

time-out 1

G

F

get review 3

time-out 3

48 cases16 performers

Page 18: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Challenges• Refining existing algorithm for (control-flow/process

perspective)– Hidden tasks– Duplicate tasks– Non-free-choice constructs– Loops– Detecting concurrency (implicit or explicit)– Mining and exploiting time– Dealing with noise– Dealing with incompleteness

• Mining other perspectives (data, resources, roles, …) • Gathering data from heterogeneous sources• Visualization of results• Delta analysis

Page 19: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Genetic mining

Page 20: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Approach

Page 21: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Genetic mining: The two main questions

• How to represent an individual? (Petri net?)• How to define the genetic operators? (e.g.,

crossover)

Page 22: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

How to represent an individual?

• Problems with Petri nets:– Places do not exist in log– difficulties defining mutation and crossover– problems describing subtle rules without adding transitions

A

B

C

D

Page 23: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Representation of the goal processtrue A A A D D E^F BvCvG

→ A B C D E F G H

A 0 1 1 1 0 0 0 0 BvCvD

B 0 0 0 0 0 0 0 1 H

C 0 0 0 0 0 0 0 1 H

D 0 0 0 0 1 1 0 0 E^F

E 0 0 0 0 0 0 1 0 G

F 0 0 0 0 0 0 1 0 G

G 0 0 0 0 0 0 0 1 H

H 0 0 0 0 0 0 0 0 true

A

B

D

E

C

F

G

H

Page 24: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

A more compact representation

ACTIVITY INPUT OUTPUT

A {} {{B,C,D}}

B {{A}} {{H}}

C {{A}} {{H}}

D {{A}} {{E},{F}}

E {{D}} {{G}}

F {{D}} {{G}}

G {{E},{F}} {{H}}

H {{B,C,G}} {}

A

B

D

E

C

F

G

H

Page 25: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Any Petri net can be mapped onto a causal matrix:

ACTIVITY INPUT OUTPUT

A {...} {{C,D},...}

B {...} {{C,D},...}

C {{A,B},...} {...}

D {{A,B},...} {...}

A

B D

C

but ...

Page 26: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Mapping a causal matrix onto a Petri net?

ACTIVITY INPUT OUTPUT

A {{i11,i12,i13},{i21,i22,i23}} {{o11,o12,o13},{o21,o22,o23}}

A

i11i12i13

i21i22i23

o11o12o13

o21o22o23

Page 27: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Wiring based on input and output sets

A

i11i12i13

i21i22i23

B

o11o12o13

o21o22o23

A

B

?

Using place fusion or silent transitions.

Page 28: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Example: Event logcase id activity id originator timestamp

case 1 activity A John 9-3-2004:15.01

case 2 activity A John 9-3-2004:15.12

case 3 activity A Sue 9-3-2004:16.03

case 3 activity D Carol 9-3-2004:16.07

case 1 activity B Mike 9-3-2004:18.25

case 1 activity H John 10-3-2004:9.23

case 2 activity C Mike 10-3-2004:10.34

case 4 activity A Sue 10-3-2004:10.35

case 2 activity H John 10-3-2004:12.34

case 3 activity E Pete 10-3-2004:12.50

case 3 activity F Carol 11-3-2004:10.12

case 4 activity D Pete 11-3-2004:10.14

case 3 activity G Sue 11-3-2004:10.44

case 3 activity H Pete 11-3-2004:11.03

case 4 activity F Sue 11-3-2004:11.18

case 4 activity E Clare 11-3-2004:12.22

case 4 activity G Mike 11-3-2004:14.34

case 4 activity H Clare 11-3-2004:14.38

Page 29: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

A

B

D

E

C

F

G

H

Goal

Page 30: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Example: Starting pointcase id activity id

case 1 activity A

case 2 activity A

case 3 activity A

case 3 activity D

case 1 activity B

case 1 activity H

case 2 activity C

case 4 activity A

case 2 activity H

case 3 activity E

case 3 activity F

case 4 activity D

case 3 activity G

case 3 activity H

case 4 activity F

case 4 activity E

case 4 activity G

case 4 activity H

+ 500 randomly generated initial individuals

Page 31: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Two individuals

ACTIVITY INPUT OUTPUT

A {} {{B,C,D}}

B {{A}} {{H}}

C {{A}} {{H}}

D {{A}} {{E}}

E {{D}} {{G}}

F {} {{G}}

G {{E},{F}} {{H}}

H {{C,B,G}} {}

ACTIVITY INPUT OUTPUT

A {} {{B,C,D}}

B {{A}} {{H}}

C {{A}} {{H}}

D {{A}} {{E,F}}

E {{D}} {{G}}

F {{D}} {{G}}

G {{E},{F}} {{H}}

H {{C},{B},{G}} {}

Page 32: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Crossover

ACTIVITY INPUT OUTPUT

A {} {{B,C,D}}

B {{A}} {{H}}

C {{A}} {{H}}

D {{A}} {{E, F}}

E {{D}} {{G}}

F {{D}} {{G}}

G {{E},{F}} {{H}}

H {{C,B,G}} {}

ACTIVITY INPUT OUTPUT

A {} {{B,C,D}}

B {{A}} {{H}}

C {{A}} {{H}}

D {{A}} {{E}}

E {{D}} {{G}}

F {} {{G}}

G {{E},{F}} {{H}}

H {{C},{B},{G}} {}

Page 33: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Resulting CM with fitness 1.0

true A A A D D E^F BvCvG

→ A B C D E F G H

A 0 1 1 1 0 0 0 0 BvCvD

B 0 0 0 0 0 0 0 1 H

C 0 0 0 0 0 0 0 1 H

D 0 0 0 0 1 1 0 0 E^F

E 0 0 0 0 0 0 1 0 G

F 0 0 0 0 0 0 1 0 G

G 0 0 0 0 0 0 0 1 H

H 0 0 0 0 0 0 0 0 true

Page 34: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Mapping

A

B

D

E

C

F

G

H

true00000000H

H10000000G

G01000000F

G01000000E

E^F00110000D

H10000000C

H10000000B

BvCvD00001110A

H G F E D C B A→

BvCvGE^F D D A A A true

true00000000H

H10000000G

G01000000F

G01000000E

E^F00110000D

H10000000C

H10000000B

BvCvD00001110A

H G F E D C B A→

BvCvGE^F D D A A A true

Page 35: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

ProM framework

Page 36: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

ProMStaffware

InConcert

MQ Series

workflow management systems

FLOWer

Vectus

Siebel

case handling / CRM systems

SAP R/3

BaaN

Peoplesoft

ERP systems

common XML format for storing/exchanging workflow logs

input/outputCore

Plugins

ProMframework

visualization analysis

alpha algorithmgenetic

algorithmTsinghua alpha

algorithmMulti phasealgorithms

social networkminer

case dataextraction

property verifier

ExternalTools

NetMiner Viscovery ......

...

Page 37: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Converter plug-in: EMailAnalyzer

Page 38: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

XML format

Page 39: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

ProM architecture

UserInterface

+User

Interaction

StaffwareFlowerSAPInConcert...

Heuristic NetAris Graph Format(Aris AML Format)PNMLTPN...

MiningPlugin

ImportPlugin

ExportPlugin

AnalysisPlugin

ConversionPlugin

Heuristic Net PNMLAris Graph format TPNNetMiner file Agna fileAris PPM Instances DOTComma Seperated Values …...

Log Filter

VisualisationEngine

XML Log

ResultFrame

Page 40: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Mining plug-in: Alpha algorithm

Page 41: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Mining plug-in: Genetic Miner

Page 42: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Mining plug-in: Multi-phase mining

Page 43: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Step 1: Get instances

A BD

CE A B

D

CF

A BD

CG

H

IB

D

CE

Page 44: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Step 2: Project

A BD

CG

1

H

I

E

1 2

2

2

1

11

1

11

11

1

1

1

1

ts 11

tf 11

A BD

CE1

1 1

1

1

1tf 1

1ts 11

A BD

CF1

1 1

1

1

1tf 1

1ts 11

Page 45: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Step 3: Aggregate

A BD

CG

3

H

I

E

3 4

4

4

2

11

1

22

11

1

1

1

1

ts 33

tf 32

F1

1

1

Page 46: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Step 4: Map onto EPCets

ts

B

C D

H I

G

etf

E F

eB

eH eI

eC eD

eG

eE eF

A

eA

Page 47: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Step 5: Map onto Petri net (or other language)

A B

C

D

E

F

G

H

I

ts

Page 48: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Mining plug-in: Social network miner

Page 49: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.
Page 50: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Cliques

Page 51: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

SN based on hand-over of work metric

density of network is 0.225

Page 52: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

SN based on working together (and ego network)

Page 53: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Analysis plug-in: LTL checker

Page 54: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.
Page 55: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Analysis plug-in: Conformance checker

Do they agree?

Page 56: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.
Page 57: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Fitness is not enough

Page 58: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Screenshot

(Also runs on Mac.)

Page 59: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Other analysis plug-ins

Page 60: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

More demos?

Page 61: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

Conclusion

• Process mining provides many interesting challenges for scientists, customers, users, managers, consultants, and tool developers.

• Involves multiple perspectives (process, data, resources, etc.)

• Get ProM-ed!• You can contribute by applying ProM and developing

plug-ins

processdesign

implementation/configuration

processenactment

diagnosis

Page 62: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo.

More information

http://www.workflowcourse.com

http://www.workflowpatterns.com

http://www.processmining.org