Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven,...

64
Reality is not always what it seems! Using process mining and conformance checking to find out what is really going on in your system Prof.dr.ir. Wil van der Aalst Eindhoven University of Technology, P.O. Box 513, 5600 MB Eindhoven, The Netherlands [email protected]

Transcript of Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven,...

Page 1: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Reality is not always what it seems!Using process mining and conformance checking to find out what is really going on in your system

Prof.dr.ir. Wil van der AalstEindhoven University of Technology, P.O. Box 513, 5600 MB Eindhoven, The Netherlands

[email protected]

Page 2: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Outline• Process Analysis: From verification to process mining• Process Mining: Running example• Discovery• Conformance checking• Reality Check• Conclusion

The work of many people!

Thanks to Ton Weijters, Boudewijn van Dongen, Ana Karla Alves de Medeiros, Anne Rozinat, Christian Günter, Eric Verbeek, Ronny Mans, Minseok Song, Laura Maruster, Huub de Beer, Peter van den Brand, Jan Mendling, Andriy Nikolov, Jianmin Wang, Lijie Wen, Irene Vanderfeesten, Mariska Netjes, Steffi Rinderle, Walid Gaaloul, Gianluigi Greco, Antonella Guzzo, etc. etc.

Page 3: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Overview Process Analysis

Page 4: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Software systems are the mirror image of the “world”

Page 5: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Dual role of process models

modelsanalyzes

specifiesconfiguresimplements

analyzes

supports/controls

people machines

organizationscomponents

business processes

verification

“analysis of models only makes sense if they are an

adequate reflection of reality”

Page 6: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

the disconnect …

Page 7: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Example: Verification of the SAP Reference model(Joint work with Jan Mendling)

• The SAP reference model contains more than 600 non-trivial process models expressed in terms of Event-driven Process Chains (EPCs).

Page 8: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Approach

604 non-trivialprocess models

collectcharacteristics

modelanalysis

compare

Page 9: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Simplistic approach: YAWL + invariants

Analysis using transition invariants, i.e., only lowerbound! ProM allows for more precise analysis

Page 10: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Simplistic approach: YAWL + Petri net

invariants

Page 11: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

5.6%

Page 12: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

5.6% is a lower bound!• Using more refined techniques more errors are

found, e.g., using reduction rules and state-spaceanalysis it can be shown that 20.9% of the SAP models are incorrect (126/604).

• Other large repositories of EPC models:– Collection of 381 non-trivial EPCs from a German

process reengineering project in the service sector– Collection of 935 non-trivial EPCs from the Austrian

financial industry– Collection of 83 non-trivial EPCs from three different

consulting companies• Total: 2003 non-trivial EPCs

Page 13: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Overview results

• Designers make errors (10.7%) !• Errors can be predicted (95.2%) !• Process verification is mature, but models are not!• Disconnect between ref. models and systems cf. SAP

Page 14: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Limitations of using models as a starting point

modelsanalyzes

specifiesconfiguresimplements

analyzes

supports/controls

people machines

organizationscomponents

business processes

verification real worldpowerpoint reality

Page 15: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Event logs are a reflection of reality

records events, e.g., messages,

transactions, etc.

supports/controls

people machines

organizationscomponents

business processes

“logs are everywhere and there will be more …”

Page 16: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Examples:

Page 17: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Process mining: Linking events to models

modelsanalyzes

discovery

records events, e.g., messages,

transactions, etc.

specifiesconfiguresimplements

analyzes

supports/controls

conformance

people machines

organizationscomponents

business processes

verification

Page 18: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Toy example to explain basic idea:

Reviewing of papers for journal ☺

Page 19: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Event log:• processes

– process instances• events

Per event:• activity name• (event type)• (originator)• (timestamp)• (data)

Page 20: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

start of process instance

start of activity

end of activity

attributes of an event

Page 21: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Discovery

modelsanalyzes

discovery

records events, e.g., messages,

transactions, etc.

specifiesconfiguresimplements

analyzes

supports/controls

conformance

people machines

organizationscomponents

business processes

verification

Page 22: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains
Page 23: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

No transactional information

Page 24: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Corresponding EPC model (used by SAP,ARIS, etc)

Page 25: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

YAWL model (executable workflow model)

Page 26: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Conversions/exports/imports

• ARIS – ARIS PPM• BPEL 1.1 (WebSphere/Oracle)• YAWL• CPN Tools• Petrify• Woflan• Heuristics nets• …

Page 27: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Theory of Regions !?

Differences with synthesis:– aim (exact/executable

model – insight)– input (complete/perfect

information – partial/noisy information)

Two steps:1. log2transition-system 2. transition-system2Petri-net

Page 28: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Building a transition system

Page 29: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Contructing a Petri net using Petrify

Petri net

overfitting? too detailed/precise?

Page 30: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Example "Eichstätt paper"

Page 31: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

about 30 mining plug-ins!

Page 32: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Social network analysis

Page 33: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Decision point analysis

Page 34: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Performance analysis

Page 35: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Discovering patterns

Page 36: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Conformance Checking

modelsanalyzes

discovery

records events, e.g., messages,

transactions, etc.

specifiesconfiguresimplements

analyzes

supports/controls

conformance

people machines

organizationscomponents

business processes

verification

Page 37: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Comparing the discovered model with the log (f=1)

Page 38: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Different process model, same log (f=0.796)

Decision cannot repeated according to model but can be repeated in reality!

Page 39: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Adding deviations to the log (f=0.89)

Page 40: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

LTL checker plug-in

Page 41: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains
Page 42: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Goal of ProM: Complete support

StaffwareFLOWer

WebsphereYAWLADEPT

ARIS PPM/SIMOutlookCaramba

SAPPeopleSoftInConcert

IBM MQSeriesCPN Tools

CVSOracle BPEL

UML SDcompany specific

systems...

EPC (ARIS, ARIS PPM, EPML,Visio)

BPEL (Oracle BPEL, Websphere)

YAWLPetri nets (PNML, TPN, ...)

CPN (CPN Tools)Protos

...Netminer

...

CJIBUWV

RijkswaterstaatASML

AMC hospitalCatharina hospital

EindhovenHeusden

ING BankPhilips medical

systems...

Page 43: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Reality Check

Page 44: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Reality Check

• Process mining on structured/administrative workflow-like logs is relatively easy.

• However, let us look at two extreme logs:– A log from a hospital with information on treatments,

complications, and diagnoses.– A log from a manufacturer of high-tech system with

information on system tests.

Page 45: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

First example: Hospital data• Information on treatment, complication, and

diagnosis events.• Data:

– 2712 cases (all unique)– 29258 events– +/- 10.8 events per case– 264 different events (activities)

Page 46: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Frequency of activities

Model element Event type Occurrences (absolute)

Occurrences (relative)

B_Perifeer infuus start 2837 9,696%

B_Maagsonde start 2430 8,305%

B_Beademing start 2187 7,475%B_Catheter a Demeure

start 2096 7,164%

B_Basiszorg start 2010 6,87%B_Arterie lijn op OK

start 2002 6,843%

B_O2 masker/slang start 1954 6,679%

B_Thoraxdrain start 1863 6,367%

Page 47: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

C_N Phrenicus Paralyse start 1 0,003%

C_TIA start 1 0,003%

B_Horizontaal start 1 0,003%

C_Cholecystitis, acalc start 1 0,003%

C_Decubitus hak st. 3a start 1 0,003%

C_Druk necrose elders start 1 0,003%

B_Decubitus zorg stadium 3b start 1 0,003%

C_Haemolyse start 1 0,003%

B_Decubitus zorg stadium 4b start 1 0,003%

B_Isolatie Beschermend start 1 0,003%

B_Donor Weefsel start 1 0,003%

C_Polyurie (>40ml/kg/24u) start 1 0,003%

C_Decubitus overig st. 3a start 1 0,003%

C_Intra-peritoneaal Abces start 1 0,003%

Page 48: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Heuristics miner

Page 49: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Petri net

Page 50: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Selection: Care after hart surgery

• Data– 874 cases (all unique)– 10478 events– 181 different events

(activities)

Page 51: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Second example: Test data from high-tech system manufacturer• Information on testing process of high-tech

systems.• Data:

– 24 comparable cases– 154966 events– +/- 6450 events per case– between 2820 and 16250 events per machine– 720 different events (start/complete activities)

Page 52: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Helicopter view

Page 53: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Average time spent in job-steps (aggregated events)

Page 54: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Mining just the complete events (# 360)…

Page 55: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Common activities (#70)

Page 56: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Job step level

reference model

discovered model

Page 57: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Conformance checker (reference model – job steps)

Page 58: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Discovered models fit better than reference model

Page 59: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Research challenge

Mining less structured processes: the more unstructured, the more important it

is to know what is going on!

Page 60: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

An analogy to this chaos are...An analogy to this chaos are...topologies!topologies!Highlights more important pathsHighlights more important paths

More significant nodes are emphasized

More significant nodes are emphasized

Page 61: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

More to learn from maps...

AggregationClustering of coherent, less significant structures

AggregationClustering of coherent, less significant structures

AbstractionRemoving isolated, less significant structures

AbstractionRemoving isolated, less significant structures

Page 62: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

ProM’s Frequency abstraction miner

Page 63: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Conclusion• Reality is different from models!• The existence of event data

enables a wide variety of process mining techniques: discovery and conformance.

• ProM supports this (+150 plug-ins)• Although quite successful for

"structured processes", "spaghetti processes" remain a challenge (two examples were given).

• Research should aim to address this challenge.

software system

process/systemmodel

eventlogs

modelsanalyzes

discovery

records events, e.g., messages,

transactions, etc.

specifiesconfiguresimplements

analyzes

supports/controls

conformance

“world”

people machines

organizationscomponents

business processes

verification

Page 64: Eindhoven University of Technology, P.O. Box 513, 5600 MB ... · P.O. Box 513, 5600 MB Eindhoven, The Netherlands w.m.p.v.d.aalst@tue.nl. Outline ... • The SAP reference model contains

Relevant WWW sites

• http://www.processmining.org• http://promimport.sourceforge.net

• http://prom.sourceforge.net

• http://www.workflowpatterns.com

• http://www.workflowcourse.com

• http://www.win.tue.nl/is/

• http://is.tm.tue.nl/staff/wvdaalst