Process Mining: Understanding and Improving Desire Lines in Big Data

of 103 /103
Process Mining Understanding and Improving Desire Lines in Big Data prof.dr.ir. Wil van der Aalst www.processmining.org

description

We are pleased to announce the lecture: “Process Mining: Understanding and Improving Desire Lines in Big Data” in honour of doctor honoris causa Wil van der Aalst. Wednesday May 30th - 10.00 a.m. - 12 a.m., Hasselt University, campus Diepenbeek (Agoralaan, building D) - auditorium H5 The Faculty of Business Economics of Hasselt University is pleased to invite you to the lecture “Process Mining: Understanding and Improving Desire Lines in Big Data”. This lecture is organised to honour prof. dr. Wil van der Aalst, on whom the degree of ‘doctor honoris causa’ will be conferred by Hasselt University, Faculty of Business Economics (promotor prof. Koen Vanhoof). Professor van der Aalst is a full professor of Information Systems at the Technische Universiteit Eindhoven (TU/e). Currently he is also an adjunct professor at Queensland University of Technology (QUT).His research interests include workflow management, process mining, Petri nets, business process management, process modeling, and process analysis. Many of his ideas have influenced researchers, software developers and standardization committees working on process support.

Transcript of Process Mining: Understanding and Improving Desire Lines in Big Data

Page 1: Process Mining: Understanding and Improving Desire Lines in Big Data

Process MiningUnderstanding and Improving Desire Lines in Big Data

prof.dr.ir. Wil van der Aalstwww.processmining.org

Page 2: Process Mining: Understanding and Improving Desire Lines in Big Data

PAGE 2

Let’s Play: Play-Out, Play-In, ReplayBig DataDesire LinesProcess MiningHow Good is My Model?

Process DiscoveryConformance Checking

Food for Thought: Lasagna and SpaghettiGoogle Maps and TomTom

How to Get Started?Conclusion

Page 3: Process Mining: Understanding and Improving Desire Lines in Big Data

PAGE 3

On the different roles of (process) models …

Page 4: Process Mining: Understanding and Improving Desire Lines in Big Data

Play-Out

PAGE 4

event logprocess model

Page 5: Process Mining: Understanding and Improving Desire Lines in Big Data

A

B

C

DE

p2

end

p4

p3p1

start

Play-Out (Classical use of models)

PAGE 5

A B C D

A C B DA B C D

A E D

A C B D

A C B D

A E D

A E D

Page 6: Process Mining: Understanding and Improving Desire Lines in Big Data

Play-In

PAGE 6

event log process model

Page 7: Process Mining: Understanding and Improving Desire Lines in Big Data

A

B

C

DE

p2

end

p4

p3p1

start

Play-In

PAGE 7

A C B DA B C D

A E D

A C B D

A C B D

A E D

A E DA B C D

Page 8: Process Mining: Understanding and Improving Desire Lines in Big Data

Example Process Discovery(Vestia, Dutch housing agency, 208 cases, 5987 events)

PAGE 8

Page 9: Process Mining: Understanding and Improving Desire Lines in Big Data

Example Process Discovery(ASML, test process lithography systems, 154966 events)

PAGE 9

Page 10: Process Mining: Understanding and Improving Desire Lines in Big Data

Example Process Discovery(AMC, 627 gynecological oncology patients, 24331 events)

PAGE 10

Page 11: Process Mining: Understanding and Improving Desire Lines in Big Data

Replay

PAGE 11

event log process model

· extended model showing times, frequencies, etc.

· diagnostics· predictions· recommendations

Page 12: Process Mining: Understanding and Improving Desire Lines in Big Data

A

B

C

DE

p2

end

p4

p3p1

start

Replay

PAGE 12

A B C D

Page 13: Process Mining: Understanding and Improving Desire Lines in Big Data

A

B

C

DE

p2

end

p4

p3p1

start

Replay

PAGE 13

A E D

Page 14: Process Mining: Understanding and Improving Desire Lines in Big Data

A

B

C

DE

p2

end

p4

p3p1

start

Replay can detect problems

PAGE 14

AC D

Problem!missing token

Problem!token left behind

Page 15: Process Mining: Understanding and Improving Desire Lines in Big Data

Conformance Checking (WOZ objections Dutch municipality, 745 objections, 9583 event, f= 0.988)

PAGE 15

Page 16: Process Mining: Understanding and Improving Desire Lines in Big Data

A

B

C

DE

p2

end

p4

p3p1

start

Replay can extract timing information

PAGE 16

A5B8 C9 D13

5

8

9

13

3

4

5

43

265

8

764

7

74

3

Page 17: Process Mining: Understanding and Improving Desire Lines in Big Data

PAGE 17

Performance Analysis Using Replay(WOZ objections Dutch municipality, 745 objections, 9583 event, f= 0.988)

Page 18: Process Mining: Understanding and Improving Desire Lines in Big Data

PAGE 18

Big Data

Page 19: Process Mining: Understanding and Improving Desire Lines in Big Data

Big Data

PAGE 19

Source: “Big Data: The Next Frontier for Innovation, Competition, and Productivity” McKinsey Global Institute, 2011.

“Enterprises globally stored more than 7 exabytesof new data on disk drives in 2010, while consumers stored morethan 6 exabytes of new data on devices such as PCs and notebooks.”

“All of the world's music can be stored

on a $600 disk drive.”

“Indeed, we are generating so much data today that it is

physically impossible to store it all. Health

care providers, for instance, discard 90

percent of the data that they generate.”

Page 20: Process Mining: Understanding and Improving Desire Lines in Big Data

PAGE 20

Hilbert and Lopez. The World's Technological Capacity to Store, Communicate, and Compute Information. Science, 332(6025):60-65, 2011.

Page 21: Process Mining: Understanding and Improving Desire Lines in Big Data

PAGE 21

www.olifantenpaadjes.nl

Page 24: Process Mining: Understanding and Improving Desire Lines in Big Data

PAGE 24

Page 25: Process Mining: Understanding and Improving Desire Lines in Big Data

Evidence-Based Business Process Management

PAGE 25

Page 26: Process Mining: Understanding and Improving Desire Lines in Big Data

PAGE 26

Page 27: Process Mining: Understanding and Improving Desire Lines in Big Data

PAGE 27

Process Mining

Page 28: Process Mining: Understanding and Improving Desire Lines in Big Data

PAGE 28

Event Data + Processes

Process Mining =

Data Mining + Process Analysis

Machine Learning + Formal Methods

Page 29: Process Mining: Understanding and Improving Desire Lines in Big Data

Process Mining

software system

(process)model

eventlogs

modelsanalyzes

discovery

records events, e.g., messages,

transactions, etc.

specifies configures implements

analyzes

supports/controls

enhancement

conformance

“world”

people machines

organizationscomponents

businessprocesses

Page 30: Process Mining: Understanding and Improving Desire Lines in Big Data

Starting point: event log

PAGE 30

XES, MXML, SA-MXML, CSV, etc.

Page 31: Process Mining: Understanding and Improving Desire Lines in Big Data

Simplified event log

PAGE 31

a = register request, b = examine thoroughly, c = examine casually, d = check ticket,e = decide, f = reinitiate request, g = pay compensation, and h = reject request

Page 32: Process Mining: Understanding and Improving Desire Lines in Big Data

Processdiscovery

PAGE 32

a

start register request

b

examine thoroughly

c

examine casually

d

check ticket

decide

pay compensation

reject request

reinitiate request

e

g

h

f

end

c1

c2

c3

c4

c5

Page 33: Process Mining: Understanding and Improving Desire Lines in Big Data

Conformance checking

PAGE 33

a

start register request

b

examine thoroughly

c

examine casually

d

check ticket

decide

pay compensation

reject request

reinitiate request

e

g

h

f

end

c1

c2

c3

c4

c5

case 7: e is executed without being

enabled

case 8: g or h is missing

case 10: e is missing in second

round

Page 34: Process Mining: Understanding and Improving Desire Lines in Big Data

Extension: Adding perspectives to model based on event log

PAGE 34

a

start register request

b

examine thoroughly

c

examine casually

d

check ticket

decide

pay compensation

reject request

reinitiate request

e

g

h

f

end

c1

c2

c3

c4

c5

Performance information (e.g., the average time between two subsequent activities) can be extracted from the event log and visualized on top of the model.

A

A

A

A

A

E

M

M

Pete

Mike

Ellen

Role A:Assistant

Sue

Sean

Role E:Expert

Sara

Role M:Manager

Decision rules (e.g., a decision tree based on data known at the time a particular choice was made) can be learned from the event log and used to annotated decisions.

The event log can be used to discover roles in the organization (e.g., groups of people with similar work patterns). These roles can be used to relate individuals and activities.

Page 35: Process Mining: Understanding and Improving Desire Lines in Big Data

PAGE 35

How good is my model?

Page 36: Process Mining: Understanding and Improving Desire Lines in Big Data

Four Competing Quality Criteria

PAGE 36

process discovery

fitness

precisiongeneralization

simplicity

“able to replay event log” “Occam’s razor”

“not overfitting the log” “not underfitting the log”

Page 37: Process Mining: Understanding and Improving Desire Lines in Big Data

Example: one log four models

PAGE 37

astart register

request

bexamine thoroughly

cexamine casually

d checkticket

decide

pay compensation

reject request

reinitiate requeste

g

hf

end

astart register

request

cexamine casually

dcheckticket

decide reject request

e hend

N3 : fitness = +, precision = -, generalization = +, simplicity = +

N2 : fitness = -, precision = +, generalization = -, simplicity = +

astart register

request

bexamine

thoroughly

cexamine casually

dcheck ticket

decide

pay compensation

reject request

reinitiate request

e

g

h

f

end

N1 : fitness = +, precision = +, generalization = +, simplicity = +

astart register

request

cexamine casually

dcheckticket

decide reject request

e hend

N4 : fitness = +, precision = +, generalization = -, simplicity = -

aregister request

dexamine casually

ccheckticket

decide reject request

e h

a cexamine casually

dcheckticket

decide

e g

a dexamine casually

ccheckticket

decide

e g

register request

register request

pay compensation

pay compensation

aregister request

b dcheckticket

decide reject request

e h

aregister request

d bcheckticket

decide reject request

e h

a b dcheckticket

decide

e gregister request

pay compensation

examine thoroughly

examine thoroughly

examine thoroughly

… (all 21 variants seen in the log)

acdeh

abdeg

adceh

abdeh

acdeg

adceg

adbeh

acdefdbeh

adbeg

acdefbdeh

acdefbdeg

acdefdbeg

adcefcdeh

adcefdbeh

adcefbdeg

acdefbdefdbeg

adcefdbeg

adcefbdefbdeg

adcefdbefbdeh

adbefbdefdbeg

adcefdbefcdefdbeg

455

191

177

144

111

82

56

47

38

33

14

11

9

8

5

3

2

2

1

1

1

# trace

1391

process discovery

fitness

precisiongeneralization

simplicity

“able to replay event log” “Occam’s razor”

“not overfitting the log” “not underfitting the log”

Page 38: Process Mining: Understanding and Improving Desire Lines in Big Data

Model N1

PAGE 38

acdeh

abdeg

adceh

abdeh

acdeg

adceg

adbeh

acdefdbeh

adbeg

acdefbdeh

acdefbdeg

acdefdbeg

adcefcdeh

adcefdbeh

adcefbdeg

acdefbdefdbeg

adcefdbeg

adcefbdefbdeg

adcefdbefbdeh

adbefbdefdbeg

adcefdbefcdefdbeg

455

191

177

144

111

82

56

47

38

33

14

11

9

8

5

3

2

2

1

1

1

# trace

1391

astart register

request

bexamine

thoroughly

cexamine casually

dcheck ticket

decide

pay compensation

reject request

reinitiate request

e

g

h

f

end

N1 : fitness = +, precision = +, generalization = +, simplicity = +

Page 39: Process Mining: Understanding and Improving Desire Lines in Big Data

Model N2

PAGE 39

acdeh

abdeg

adceh

abdeh

acdeg

adceg

adbeh

acdefdbeh

adbeg

acdefbdeh

acdefbdeg

acdefdbeg

adcefcdeh

adcefdbeh

adcefbdeg

acdefbdefdbeg

adcefdbeg

adcefbdefbdeg

adcefdbefbdeh

adbefbdefdbeg

adcefdbefcdefdbeg

455

191

177

144

111

82

56

47

38

33

14

11

9

8

5

3

2

2

1

1

1

# trace

1391

astart register

request

cexamine casually

dcheckticket

decide reject request

e hend

N2 : fitness = -, precision = +, generalization = -, simplicity = +

Page 40: Process Mining: Understanding and Improving Desire Lines in Big Data

Model N3

PAGE 40

acdeh

abdeg

adceh

abdeh

acdeg

adceg

adbeh

acdefdbeh

adbeg

acdefbdeh

acdefbdeg

acdefdbeg

adcefcdeh

adcefdbeh

adcefbdeg

acdefbdefdbeg

adcefdbeg

adcefbdefbdeg

adcefdbefbdeh

adbefbdefdbeg

adcefdbefcdefdbeg

455

191

177

144

111

82

56

47

38

33

14

11

9

8

5

3

2

2

1

1

1

# trace

1391

astart register

request

bexamine thoroughly

cexamine casually

d checkticket

decide

pay compensation

reject request

reinitiate requeste

g

hf

end

N3 : fitness = +, precision = -, generalization = +, simplicity = +

Page 41: Process Mining: Understanding and Improving Desire Lines in Big Data

Model N4

PAGE 41

acdeh

abdeg

adceh

abdeh

acdeg

adceg

adbeh

acdefdbeh

adbeg

acdefbdeh

acdefbdeg

acdefdbeg

adcefcdeh

adcefdbeh

adcefbdeg

acdefbdefdbeg

adcefdbeg

adcefbdefbdeg

adcefdbefbdeh

adbefbdefdbeg

adcefdbefcdefdbeg

455

191

177

144

111

82

56

47

38

33

14

11

9

8

5

3

2

2

1

1

1

# trace

1391

astart register

request

cexamine casually

dcheckticket

decide reject request

e hend

N4 : fitness = +, precision = +, generalization = -, simplicity = -

aregister request

dexamine casually

ccheckticket

decide reject request

e h

a cexamine casually

dcheckticket

decide

e g

a dexamine casually

ccheckticket

decide

e g

register request

register request

pay compensation

pay compensation

aregister request

b dcheckticket

decide reject request

e h

aregister request

d bcheckticket

decide reject request

e h

a b dcheckticket

decide

e gregister request

pay compensation

examine thoroughly

examine thoroughly

examine thoroughly

… (all 21 variants seen in the log)

Page 42: Process Mining: Understanding and Improving Desire Lines in Big Data

PAGE 42

Process Discovery

Page 43: Process Mining: Understanding and Improving Desire Lines in Big Data

Process Discovery (small selection)

PAGE 43

α algorithm

α++ algorithm

α# algorithm

language-based regions

state-based regionsgenetic mining

heuristic mining

hidden Markov models

neural networks

automata-based learning

stochastic task graphs

conformal process graph

mining block structures

multi-phase miningpartial-order based mining

fuzzy mining

LTL mining

ILP mining

distributed genetic mining

Page 44: Process Mining: Understanding and Improving Desire Lines in Big Data

Petri net view: Just discover the places …

PAGE 44

a1

...

a2

am

b1

b2

bn

p(A,B) ...

A={a1,a2, … am} B={b1,b2, … bn}

process discovery

fitness

precisiongeneralization

simplicity

“able to replay event log” “Occam’s razor”

“not overfitting the log” “not underfitting the log”

Adding a place limits behavior:•overfitting ≈ adding too many places •underfitting ≈ adding too few places

Page 45: Process Mining: Understanding and Improving Desire Lines in Big Data

Example: Process Discovery UsingState-Based Regions

PAGE 45

[ ]

a

d

b d

c

e

c

b

[a,b,c,d][a,b,c][a,c]

[ a,b][a,e] [a,d,e]

[a]

a

b

c

de

p2

end

p4

p3p1

start

event log

01011001101101001011111101101000110110011110111000001101101001001100

Page 46: Process Mining: Understanding and Improving Desire Lines in Big Data

Example of State-Based Region

PAGE 46

[ ]

a

d

b d

c

e

c

b

[a,b,c,d][a,b,c][a,c]

[ a,b][a,e] [a,d,e]

[a]

a

b

c

de

p2

end

p4

p3p1

start

enter: b,eleave: ddo-not-cross: a,c

Page 47: Process Mining: Understanding and Improving Desire Lines in Big Data

Example: Process Discovery UsingLanguage-Based Regions

PAGE 47

a1

a2

b1

b2

d

pR

e

c1

c

f

YX

A place is feasible if it can be added without disabling any of thetraces in the event log.

Page 48: Process Mining: Understanding and Improving Desire Lines in Big Data

Example of Language-Based Regions

PAGE 48

b

a

d

e

c

YX

1. accd

2. bd

3. bce

4. ace

5. acd

6. bcce

7. ade

↓accd : 0 + 0 - 0 ≥ 0

a↓ccd : 0 + 1 - 1 ≥ 0

ac↓cd : 0 + 2 - 2 ≥ 0

acc↓d : 0 + 3 - 3 ≥ 0

↓ade : 0 + 0 - 0 ≥ 0

a↓de : 0 + 1 - 1 ≥ 0

ad↓e : 0 + 1 - 2 < 0

Page 49: Process Mining: Understanding and Improving Desire Lines in Big Data

Example of a completely different process discovery technique: Genetic Mining

PAGE 49

Page 50: Process Mining: Understanding and Improving Desire Lines in Big Data

Genetic process mining: Overview

PAGE 50

next generationcomputefitness

elitism

parents

crossover

children

mutation

create initial population

“dead” individuals

tournament

select best individual

event log

termination

Page 51: Process Mining: Understanding and Improving Desire Lines in Big Data

Example: crossover

PAGE 51

a

start register request

b

examine thoroughly

c

examine casually

d

check ticket

decide

pay compensation

reject request

reinitiate request

e

g

h

f

end

a

start register request

b

examine thoroughly

c

examine casually

d

check ticket

decide

pay compensation

reject request

reinitiate request

e

g

h

f

end

a

start register request

b

examine thoroughly

c

examine casually

d

check ticket

decide

reinitiate request

e

f

a

start register request

b

examine thoroughly

c

examine casually

d

check ticket

decide

pay compensation

reject request

reinitiate request

e

g

h

f

end

pay compensation

reject request

g

h

end

Page 52: Process Mining: Understanding and Improving Desire Lines in Big Data

Example: mutation

PAGE 52

a

start register request

b

examine thoroughly

c

examine casually

d

check ticket

decide

pay compensation

reject request

reinitiate request

e

g

h

f

end

a

start register request

b

examine thoroughly

c

examine casually

d

check ticket

decide

pay compensation

reject request

reinitiate request

e

g

h

f

end

remove place

added arc

Page 53: Process Mining: Understanding and Improving Desire Lines in Big Data

PAGE 53

Conformance Checking

Page 54: Process Mining: Understanding and Improving Desire Lines in Big Data

astart register

request

bexamine

thoroughly

cexamine casually

dcheck ticket

decide

pay compensation

reject request

reinitiate request

e

g

h

f

end

Replaying trace “abeg”

PAGE 54

m=1

r=1

a b e g

6

11

6= 0.83333

Page 55: Process Mining: Understanding and Improving Desire Lines in Big Data

Can be lifted to log level

PAGE 55

astart register

request

bexamine

thoroughly

cexamine casually

dcheck ticket

decide

pay compensation

reject request

reinitiate request

e

g

h

f

end

p3p1

p2 p4

N1

p5

astart register

request

bexamine

thoroughly

cexamine casually

checkticket

decide

pay compensation

reject request

reinitiate request

e

g

hf

endp1 p2

N2

dp3

p4

astart register

request

cexamine casually

dcheckticket

decide reject request

e hend

p1

p2

p3

p4

p5

N3

astart register

request

bexamine thoroughly

cexamine casually

d checkticket

decide

pay compensation

reject request

reinitiate requeste

g

hf

end

N4

p1

acdeh

abdeg

adceh

abdeh

acdeg

adceg

adbeh

acdefdbeh

adbeg

acdefbdeh

acdefbdeg

acdefdbeg

adcefcdeh

adcefdbeh

adcefbdeg

acdefbdefdbeg

adcefdbeg

adcefbdefbdeg

adcefdbefbdeh

adbefbdefdbeg

adcefdbefcdefdbeg

455

191

177

144

111

82

56

47

38

33

14

11

9

8

5

3

2

2

1

1

1

# trace

1391

Page 56: Process Mining: Understanding and Improving Desire Lines in Big Data

From “playing the token game” to optimal alignments …

a b » e g

a b d e g

PAGE 56

astart register

request

bexamine

thoroughly

cexamine casually

dcheck ticket

decide

pay compensation

reject request

reinitiate request

e

g

h

f

end

observed trace: “abeg”

move in model only

Page 57: Process Mining: Understanding and Improving Desire Lines in Big Data

Another alignment

a b c d e g

a b » d e g

PAGE 57

astart register

request

bexamine

thoroughly

cexamine casually

dcheck ticket

decide

pay compensation

reject request

reinitiate request

e

g

h

f

end

observed trace: “abcdeg”

move in log only

Page 58: Process Mining: Understanding and Improving Desire Lines in Big Data

Moves in an alignment

PAGE 58

a b » d e g

a » c d e g

trace in event log

possible run of model

move in log

move in model move in both

Optimal alignment describes modeled behavior closest to observed behavior

Page 59: Process Mining: Understanding and Improving Desire Lines in Big Data

Moves have costs

• Standard cost function:

−c(x,») = 1

−c(»,y) = 1

−c(x,y) = 0, if x=y

−c(x,y) = ∞, if x≠yPAGE 59

… a …

… » …… » …

… a …

… a …

… a …

… a …

… b …

Page 60: Process Mining: Understanding and Improving Desire Lines in Big Data

Non-fitting trace: abefdeg

PAGE 60

astart register

request

bexamine

thoroughly

cexamine casually

dcheck ticket

decide

pay compensation

reject request

reinitiate request

e

g

h

f

end abefdeg

a b » e f d » e g

a b d e f d b e g2

2a b e f d e g

a b » » d e g

Page 61: Process Mining: Understanding and Improving Desire Lines in Big Data

Any cost structure is possible

PAGE 61

… send-letter(John,2 weeks, $400)

… send-email(Sue,3 weeks,$500)

• Similar activities (more similarity implies lower costs).• Resource conformance (done by someone that does

not have the specified role).• Data conformance (path is not possible for this

customer). • Time conformance (missed the legal deadline)

Page 62: Process Mining: Understanding and Improving Desire Lines in Big Data

Fitness

PAGE 62

astart register

request

bexamine thoroughly

cexamine casually

d checkticket

decide

pay compensation

reject request

reinitiate requeste

g

hf

end

astart register

request

cexamine casually

dcheckticket

decide reject request

e hend

N3 : fitness = +, precision = -, generalization = +, simplicity = +

N2 : fitness = -, precision = +, generalization = -, simplicity = +

astart register

request

bexamine

thoroughly

cexamine casually

dcheck ticket

decide

pay compensation

reject request

reinitiate request

e

g

h

f

end

N1 : fitness = +, precision = +, generalization = +, simplicity = +

astart register

request

cexamine casually

dcheckticket

decide reject request

e hend

N4 : fitness = +, precision = +, generalization = -, simplicity = -

aregister request

dexamine casually

ccheckticket

decide reject request

e h

a cexamine casually

dcheckticket

decide

e g

a dexamine casually

ccheckticket

decide

e g

register request

register request

pay compensation

pay compensation

aregister request

b dcheckticket

decide reject request

e h

aregister request

d bcheckticket

decide reject request

e h

a b dcheckticket

decide

e gregister request

pay compensation

examine thoroughly

examine thoroughly

examine thoroughly

… (all 21 variants seen in the log)

acdeh

abdeg

adceh

abdeh

acdeg

adceg

adbeh

acdefdbeh

adbeg

acdefbdeh

acdefbdeg

acdefdbeg

adcefcdeh

adcefdbeh

adcefbdeg

acdefbdefdbeg

adcefdbeg

adcefbdefbdeg

adcefdbefbdeh

adbefbdefdbeg

adcefdbefcdefdbeg

455

191

177

144

111

82

56

47

38

33

14

11

9

8

5

3

2

2

1

1

1

# trace

1391

1.0

0.8

1.0

1.0

Our A* algorithm exploits the Petri net marking equation and uses other “tricks” to prune the search space.

Aligned event log is starting point for other types of analysis.

Page 63: Process Mining: Understanding and Improving Desire Lines in Big Data

PAGE 63

Alignments are essential!•conformance checking to diagnose deviations•squeezing reality into the model to do model-based analysis

Page 64: Process Mining: Understanding and Improving Desire Lines in Big Data

PAGE 64

Food for Thought

Page 65: Process Mining: Understanding and Improving Desire Lines in Big Data

We applied ProM in >100 organizations

PAGE 65

• Municipalities (e.g., Alkmaar, Heusden, Harderwijk, etc.)• Government agencies (e.g., Rijkswaterstaat, Centraal

Justitieel Incasso Bureau, Justice department)• Insurance related agencies (e.g., UWV)• Banks (e.g., ING Bank)• Hospitals (e.g., AMC hospital, Catharina hospital)• Multinationals (e.g., DSM, Deloitte)• High-tech system manufacturers and their customers

(e.g., Philips Healthcare, ASML, Ricoh, Thales)• Media companies (e.g. Winkwaves)• ...

Page 66: Process Mining: Understanding and Improving Desire Lines in Big Data

How can process mining help?

PAGE 66

• Uncover bottlenecks• Detect deviations• Performance

measurement• Auditing/compliance• Business Process

Redesign (BPR)• Continuous improvement

(Six Sigma)• Operational support (e.g.,

recommendation and prediction)

• Provide new insights• Highlight important

problems• An organization’s

mirror (in two ways)• Helps to avoid ICT

failures• Avoid “management by

PowerPoint” • From “politics” to

“analytics”

Page 67: Process Mining: Understanding and Improving Desire Lines in Big Data

PAGE 67

Page 68: Process Mining: Understanding and Improving Desire Lines in Big Data

Example of a Lasagna process: WMO process of a Dutch municipality

PAGE 68

Each line corresponds to one of the 528 requests that were handled in the period from 4-1-2009 until 28-2-2010. In total there are 5498 events represented as dots. The mean time needed to handled a case is approximately 25 days.

Page 69: Process Mining: Understanding and Improving Desire Lines in Big Data

WMO process(Wet Maatschappelijke Ondersteuning)

• WMO refers to the social support act that came into force in The Netherlands on January 1st, 2007.

• The aim of this act is to assist people with disabilities and impairments. Under the act, local authorities are required to give support to those who need it, e.g., household help, providing wheelchairs and scootmobiles, and adaptations to homes.

• There are different processes for the different kinds of help. We focus on the process for handling requests for household help.

• In a period of about one year, 528 requests for household WMO support were received.

• These 528 requests generated 5498 events.PAGE 69

Page 70: Process Mining: Understanding and Improving Desire Lines in Big Data

C-net discovered using heuristic miner (1/3)

PAGE 70

Page 71: Process Mining: Understanding and Improving Desire Lines in Big Data

C-net discovered using heuristic miner (2/3)

PAGE 71

Page 72: Process Mining: Understanding and Improving Desire Lines in Big Data

C-net discovered using heuristic miner (3/3)

PAGE 72

Page 73: Process Mining: Understanding and Improving Desire Lines in Big Data

Conformance check WMO process (1/3)

PAGE 73

Page 74: Process Mining: Understanding and Improving Desire Lines in Big Data

Conformance check WMO process (2/3)

PAGE 74

Page 75: Process Mining: Understanding and Improving Desire Lines in Big Data

Conformance check WMO process (3/3)

PAGE 75

The fitness of the discovered process is 0.99521667. Of the 528 cases, 496 cases fit perfectly whereas for 32 cases there are missing or remaining tokens.

Page 76: Process Mining: Understanding and Improving Desire Lines in Big Data

Bottleneck analysis WMO process (1/3)

PAGE 76

Page 77: Process Mining: Understanding and Improving Desire Lines in Big Data

Bottleneck analysis WMO process (2/3)

PAGE 77

Page 78: Process Mining: Understanding and Improving Desire Lines in Big Data

Bottleneck analysis WMO process (3/3)

PAGE 78

flow time of approx. 25 days with a standard deviation of approx. 28

Page 79: Process Mining: Understanding and Improving Desire Lines in Big Data

Two additional Lasagna processes

PAGE 79

RWS (“Rijkswaterstaat”)

process

WOZ (“Waardering Onroerende Zaken”)

process

Page 80: Process Mining: Understanding and Improving Desire Lines in Big Data

RWS Process

PAGE 80

• The Dutch national public works department, called “Rijkswaterstaat” (RWS), has twelve provincial offices. We analyzed the handling of invoices in one of these offices.

• The office employs about 1,000 civil servants and is primarily responsible for the construction and maintenance of the road and water infrastructure in its province.

• To perform its functions, the RWS office subcontracts various parties such as road construction companies, cleaning companies, and environmental bureaus. Also, it purchases services and products to support its construction, maintenance, and administrative activities.

Page 81: Process Mining: Understanding and Improving Desire Lines in Big Data

C-net discovered using heuristic miner

PAGE 81

Page 82: Process Mining: Understanding and Improving Desire Lines in Big Data

Social network constructed based on handovers of work

PAGE 82

Each of the 271 nodes corresponds to a civil servant. Two civil servants areconnected if one executed an activity causally following an activity executed by the other civil servant

Page 83: Process Mining: Understanding and Improving Desire Lines in Big Data

Social network consisting of civil servants that executed more than 2000 activities in a 9 month period.

PAGE 83

The darker arcs indicate the strongest relationships in the social network. Nodes having the same color belong to the same clique.

Page 84: Process Mining: Understanding and Improving Desire Lines in Big Data

WOZ process

• Event log containing information about 745 objections against the so-called WOZ (“Waardering Onroerende Zaken”) valuation.

• Dutch municipalities need to estimate the value of houses and apartments. The WOZ value is used as a basis for determining the real-estate property tax.

• The higher the WOZ value, the more tax the owner needs to pay. Therefore, there are many objections (i.e., appeals) of citizens that assert that the WOZ value is too high.

• “WOZ process” discovered for another municipality (i.e., different from the one for which we analyzed the WMO process).

PAGE 84

Page 85: Process Mining: Understanding and Improving Desire Lines in Big Data

Discovered process model

PAGE 85

The log contains events related to 745 objections against the so-called WOZ valuation. These 745 objections generated 9583 events. There are 13 activities. For 12 of these activities both start and complete events are recorded. Hence, the WF-net has 25 transitions.

Page 86: Process Mining: Understanding and Improving Desire Lines in Big Data

Conformance checker:(fitness is 0.98876214)

PAGE 86

Page 87: Process Mining: Understanding and Improving Desire Lines in Big Data

Performance analysis

PAGE 87

bottleneck detection: places are colored based on average durations

information on total flow time

time required to move from one activity to another

Page 88: Process Mining: Understanding and Improving Desire Lines in Big Data

Resource-activity matrix(four groups discovered)

PAGE 88

clique 2

clique 1

clique 3

clique 4

Page 89: Process Mining: Understanding and Improving Desire Lines in Big Data

PAGE 89

Page 90: Process Mining: Understanding and Improving Desire Lines in Big Data

Example of a Spaghetti process

PAGE 90

Spaghetti process describing the diagnosis and treatment of 2765 patients in a Dutch hospital. The process model was constructed based on an event log containing 114,592 events. There are 619 different activities (taking event types into account) executed by 266 different individuals (doctors, nurses, etc.).

Page 91: Process Mining: Understanding and Improving Desire Lines in Big Data

Fragment18 activities of the 619 activities (2.9%)

PAGE 91

Page 92: Process Mining: Understanding and Improving Desire Lines in Big Data

Another example(event log of Dutch housing agency)

PAGE 92

The event log contains 208 cases that generated 5987 events. There are 74 different activities.

Page 93: Process Mining: Understanding and Improving Desire Lines in Big Data

PAGE 93

Page 94: Process Mining: Understanding and Improving Desire Lines in Big Data

PAGE 94

Google Maps and TomTom

Page 95: Process Mining: Understanding and Improving Desire Lines in Big Data

Process models should be treated as electronic maps

PAGE 95

Page 96: Process Mining: Understanding and Improving Desire Lines in Big Data

Business process movies

PAGE 96

Page 97: Process Mining: Understanding and Improving Desire Lines in Big Data

Business information systems should offer “TomTom” functionality

PAGE 97

Predict: When will I be home? At 11.26!

Recommend: How to get home ASAP? Take a left turn!

Detect: You drive too fast!

Page 98: Process Mining: Understanding and Improving Desire Lines in Big Data

PAGE 98

How to get started?

Page 99: Process Mining: Understanding and Improving Desire Lines in Big Data

Hundreds of plug-ins available covering the whole process mining spectrum

PAGE 99Download from: www.processmining.org

open-source (L-GPL)

Page 100: Process Mining: Understanding and Improving Desire Lines in Big Data

How to Get Started?

Collect event data• Minimal requirement:

events referring to an activity name and a process instance.

• Good to have: timestamps, resource information, additional data elements.

• Challenges: scoping and sometimes correlation.

Collect questions

• What kind problems would you like to address (cost, time, risk, compliance, service, etc.)?

• Related to discovery, conformance, enhancement?

• Iterative process: can be “curiosity driven” initially.

PAGE 100

Page 101: Process Mining: Understanding and Improving Desire Lines in Big Data

PAGE 101

Conclusion

Page 102: Process Mining: Understanding and Improving Desire Lines in Big Data

Conclusion

PAGE 102

Page 103: Process Mining: Understanding and Improving Desire Lines in Big Data

PAGE 103

www.processmining.org

www.win.tue.nl/ieeetfpm/