Keynote on Process Mining at SSCI 2010 / CIDM 2011

of 86/86
Process Mining: Discovering and Improving Spaghetti and Lasagna Processes prof.dr.ir. Wil van der Aalst www.processmining.org
  • date post

    07-Nov-2014
  • Category

    Technology

  • view

    374
  • download

    2

Embed Size (px)

description

Keynote IEEE Symposium Series on Computational Intelligence (SSCI 2011)/IEEE Symposium on Computational Intelligence and Data Mining (CIDM 2011), April 2011, Paris, France

Transcript of Keynote on Process Mining at SSCI 2010 / CIDM 2011

  • 1. Process Mining:Discovering and ImprovingSpaghetti and Lasagna Processesprof.dr.ir. Wil van der Aalstwww.processmining.org
  • 2. Architecture of Information Systems @ TU/e process BPM/WFM/ discovery SOA systems Process PAIS Mining Technology conformance workflow checking patterns simulation Process Modeling/ Analysis verification
  • 3. Data explosion PAGE 2
  • 4. The Worlds Technological Capacity to Store, Communicate, and ComputeInformation by Martin Hilbert and Priscila Lpez (DOI 10.1126/science.1200970) PAGE 3
  • 5. Process Mining = (RM,RD) c11 modify conditions (YE,RD) check_A c5 (RM,RD) c2 check_A c8 (E,SD) needed? (RM,RD) (E,RD) Smoker c6(YE,RD) No start register c1 initial c3 check_B check_B c9 asses c12 decline conditions needed? risk Yes c7(FE,FD) Drinker c4 check_C check_C c10 needed? Short(91/10) 100 organizations Municipalities (e.g., Alkmaar, Heusden, Harderwijk, etc.) Government agencies (e.g., Rijkswaterstaat, Centraal Justitieel Incasso Bureau, Justice department) Insurance related agencies (e.g., UWV) Banks (e.g., ING Bank) Hospitals (e.g., AMC hospital, Catharina hospital) Multinationals (e.g., DSM, Deloitte) High-tech system manufacturers and their customers (e.g., Philips Healthcare, ASML, Ricoh, Thales) Media companies (e.g. Winkwaves) ... PAGE 6
  • 8. Process Mining supports/ world business controls processes software people machines system components organizations records events, e.g., messages, specifies transactions, models configures etc. analyzes implements analyzes discovery (process) event conformance model logs enhancement
  • 9. Starting point: event log XES, MXML, SA-MXML, CSV, etc. PAGE 8
  • 10. Simplified event log a = register request, b = examine thoroughly, c = examine casually, d = check ticket, e = decide, f = reinitiate request, g = pay compensation, and h = reject request PAGE 9
  • 11. Processdiscovery b examine thoroughly g c1 c3 pay c compensation a examine estart register casually decide c5 end request h c2 d c4 reject check ticket request f reinitiate request PAGE 10
  • 12. Conformancechecking b case 7: e is executed examine without thoroughly case 8: g or being g h is missing enabled c1 c3 pay c compensation a examine estart register casually decide c5 end request case 10: e h d is missing c2 c4 reject in second check ticket round request f reinitiate request PAGE 11
  • 13. Extension: Adding perspectives tomodel based on event log The event log can be used to discover roles in the organization (e.g., groups of people with similar work patterns). These roles can be Performance information (e.g., the used to relate individuals and average time between two activities. subsequent activities) can be extracted from the event log and visualized on top of the model. Role A: Role E: Role M: Assistant Expert Manager Decision rules (e.g., a decision tree based on data known at the time a Pete Sue Sara particular choice was made) can be learned from the event log and used Mike Sean to annotated decisions. Ellen E b A examine thoroughly A g A M c1 c3 pay c compensation a examine e A start register casually A decide c5 end request h c2 d c4 M reject check ticket request f reinitiate request PAGE 12
  • 14. Let us play
  • 15. Play-Out process model event log PAGE 14
  • 16. Play-Out (Classical use of models) B A p1 E p3 Dstart end p2 C p4 A B C D AED AED ABCD ACBD ACBD AED ACBD PAGE 15
  • 17. Play-Inevent log process model PAGE 16
  • 18. Play-InABCD AED AED ABCD ACBDACBD AED ACBD B A p1 E p3 Dstart end p2 C p4 PAGE 17
  • 19. Replay extended model showing times, frequencies, etc. diagnostics predictions recommendationsevent log process model PAGE 18
  • 20. Replay ABC D B A p1 E p3 Dstart end p2 C p4 PAGE 19
  • 21. Replay can detect problems AC D Problem! Problem! token left behind B missing token A p1 E p3 Dstart end p2 C p4 PAGE 20
  • 22. Replay can extract timing information A5 B8 C9 D13 8 5 6 4 7 3 B 2 5 8 A p1 E p3 Dstart end 5 13 4 p2 3 C p4 4 37 4 7 6 9 PAGE 21
  • 23. Desire lines in process models PAGE 22
  • 24. An example algorithm PAGE 23
  • 25. Process Discovery:basic idea PAGE 24
  • 26. >,,||,# relations Direct succession: x>y iff for some case x is directly followed by y. abcd Causality: xy iff x>y and acbd not y>x. aed Parallel: x||y iff x>y and a>b y>x a>c ab b#e a>e Choice: x#y iff not x>y and ac e#b b>c b||c c#e not y>x. ae b>d c||b c>b bd a#d c>d cd e>d ed PAGE 25
  • 27. Basic Idea Used by Algorithm (1) a b (a) sequence pattern: ab PAGE 26
  • 28. Basic Idea Used by Algorithm (2) a b c b a (c) XOR-join pattern: b ac, bc, and a#b a c c (b) XOR-split pattern: (b) XOR-split pattern:ac, and b#c ab, ab, ac, and b#c PAGE 27
  • 29. Basic Idea Used by Algorithm (3) a b c b a (e) AND-join pattern: b ac, bc, and a||b a c c (d) AND-split pattern: (d) AND-split pattern: ab, ac, and b||c ab, ac, and b||c PAGE 28
  • 30. Example Revisited a>b ab b||c b#e a>c ac c||b e#b a>e ae c#e b>c a#d bd b>d c>b c d c>d ed b e>d a p1 e p3 d start end p2 c p4Result produced by algorithm PAGE 29
  • 31. PAGE 30
  • 32. Challenge: four competing qualitycriteria able to replay event log Occams razor fitness simplicity process discoverygeneralization precision not overfitting the log not underfitting the log PAGE 31
  • 33. Flower model b c a d start end e h f g PAGE 32
  • 34. What is the best model? A D C ACD 99 B E ACE 0 BCE 85 A D BCD 0 C B E PAGE 33
  • 35. What is the best model? A D C ACD 99 B E ACE 88 BCE 85 A D BCD 78 C B E PAGE 34
  • 36. What is the best model? A D C ACD 99 B E ACE 2 BCE 85 A D BCD 3 C B E PAGE 35
  • 37. Example: one log four models b examine thoroughly g pay c compensation a examine e start register casually decide end # trace request h 455 acdeh d reject check ticket request 191 abdeg f reinitiate request 177 adceh N1 : fitness = +, precision = +, generalization = +, simplicity = + 144 abdeh 111 acdeg a c d e h 82 adceg start register examine check decide reject end request casually ticket request 56 adbeh N2 : fitness = -, precision = +, generalization = -, simplicity = + 47 acdefdbeh able to replay event log Occams razor 38 adbeg examine check thoroughly b d ticket g 33 acdefbdeh fitness simplicity pay compensation a 14 acdefbdeg start register examine c end 11 acdefdbeg request casually e f reinitiate h process decide request reject request 9 adcefcdeh discovery N3 : fitness = +, precision = -, generalization = +, simplicity = + 8 adcefdbeh 5 adcefbdeg a d c e g 3 acdefbdefdbeggeneralization precision register request check ticket examine casually decide pay compensation 2 adcefdbeg a c d e g 2 adcefbdefbdeg not overfitting the log not underfitting the log register examine check decide pay request casually ticket compensation 1 adcefdbefbdeh a d c e h 1 adbefbdefdbeg register check examine decide reject request ticket casually request 1 adcefdbefcdefdbeg a c d e h 1391 start end register examine check decide reject request casually ticket request (all 21 variants seen in the log) a b d e g register examine check decide pay request thoroughly ticket compensation a d b e h register check examine decide reject request ticket thoroughly request a b d e h register examine check decide reject request thoroughly ticket request PAGE 36 N4 : fitness = +, precision = +, generalization = -, simplicity = -
  • 38. # trace 455 acdeh Model N1 191 abdeg 177 adceh 144 abdeh 111 acdeg 82 adceg 56 adbeh b 47 acdefdbeh examine thoroughly 38 adbeg g 33 acdefbdeh pay c compensation 14 acdefbdeg a examine e 11 acdefdbegstart register casually decide end request 9 adcefcdeh h d reject 8 adcefdbeh check ticket request 5 adcefbdeg f reinitiate 3 acdefbdefdbeg requestN1 : fitness = +, precision = +, generalization = +, simplicity = + 2 adcefdbeg 2 adcefbdefbdeg 1 adcefdbefbdeh 1 adbefbdefdbeg 1 adcefdbefcdefdbeg PAGE 37 1391
  • 39. # trace 455 acdeh Model N2 191 abdeg 177 adceh 144 abdeh 111 acdeg 82 adceg 56 adbeh 47 acdefdbeh 38 adbeg a c d e h 33 acdefbdehstart register examine check decide reject end 14 acdefbdeg request casually ticket request N2 : fitness = -, precision = +, generalization = -, simplicity = + 11 acdefdbeg 9 adcefcdeh 8 adcefdbeh 5 adcefbdeg 3 acdefbdefdbeg 2 adcefdbeg 2 adcefbdefbdeg 1 adcefdbefbdeh 1 adbefbdefdbeg 1 adcefdbefcdefdbeg PAGE 38 1391
  • 40. # trace 455 acdeh Model N3 191 abdeg 177 adceh 144 abdeh 111 acdeg 82 adceg 56 adbeh 47 acdefdbeh examine check thoroughly b d ticket g 38 adbeg pay 33 acdefbdeh compensation a 14 acdefbdegstart register examine end 11 acdefdbeg request casually c e f reinitiate reject h 9 adcefcdeh decide request request 8 adcefdbeh N3 : fitness = +, precision = -, generalization = +, simplicity = + 5 adcefbdeg 3 acdefbdefdbeg 2 adcefdbeg 2 adcefbdefbdeg 1 adcefdbefbdeh 1 adbefbdefdbeg 1 adcefdbefcdefdbeg PAGE 39 1391
  • 41. # trace 455 acdehModel N4 191 abdeg 177 adceh 144 abdeh a d c e g 111 acdeg register check examine decide pay request ticket casually compensation 82 adceg a c d e g 56 adbeh register examine check decide pay request casually ticket compensation 47 acdefdbeh a d c e h 38 adbeg register check examine decide reject request ticket casually request 33 acdefbdeh a c d e h 14 acdefbdegstart end register examine check decide reject request casually ticket request 11 acdefdbeg (all 21 variants seen in the log) 9 adcefcdeh 8 adcefdbeh 5 adcefbdeg a b d e g register examine check decide pay 3 acdefbdefdbeg request thoroughly ticket compensation 2 adcefdbeg a d b e h register check examine decide reject 2 adcefbdefbdeg request ticket thoroughly request 1 adcefdbefbdeh a b d e h register examine check decide reject 1 adbefbdefdbeg request thoroughly ticket request 1 adcefdbefcdefdbeg N4 : fitness = +, precision = +, generalization = -, simplicity = - PAGE 40 1391
  • 42. Why is process mining such a difficultproblem? There are no negative examples (i.e., a log shows what has happened but does not show what could not happen). Due to concurrency, loops, and choices the search space has a complex structure and the log typically contains only a fraction of all possible behaviors. There is no clear relation between the size of a model and its behavior (i.e., a smaller model may generate more or less behavior although classical analysis and evaluation methods typically assume some monotonicity property). PAGE 41
  • 43. How can process mining help? Detect bottlenecks Provide mirror Detect deviations Highlight important Performance problems measurement Avoid ICT failures Suggest improvements Avoid management by Decision support (e.g., PowerPoint recommendation and From politics to prediction) analytics PAGE 42
  • 44. PAGE 43
  • 45. Example of a Lasagna process: WMO process of a Dutch municipalityEach line corresponds to one of the 528 requests that werehandled in the period from 4-1-2009 until 28-2-2010. In totalthere are 5498 events represented as dots. The mean timeneeded to handled a case is approximately 25 days. PAGE 44
  • 46. WMO process (Wet Maatschappelijke Ondersteuning) WMO refers to the social support act that came into force in The Netherlands on January 1st, 2007. The aim of this act is to assist people with disabilities and impairments. Under the act, local authorities are required to give support to those who need it, e.g., household help, providing wheelchairs and scootmobiles, and adaptations to homes. There are different processes for the different kinds of help. We focus on the process for handling requests for household help. In a period of about one year, 528 requests for household WMO support were received. These 528 requests generated 5498 events. PAGE 45
  • 47. C-net discovered usingheuristic miner (1/3) PAGE 46
  • 48. C-net discovered usingheuristic miner (2/3) PAGE 47
  • 49. C-net discovered usingheuristic miner (3/3) PAGE 48
  • 50. Conformance check WMO process (1/3) PAGE 49
  • 51. Conformance check WMO process (2/3) PAGE 50
  • 52. Conformance check WMO process (3/3) The fitness of the discovered process is 0.99521667. Of the 528 cases, 496 cases fit perfectly whereas for 32 cases there are missing or remaining tokens. PAGE 51
  • 53. Bottleneck analysis WMO process (1/3) PAGE 52
  • 54. Bottleneck analysis WMO process (2/3) PAGE 53
  • 55. Bottleneck analysis WMO process (3/3)flow time ofapprox. 25 dayswith a standarddeviation ofapprox. 28 PAGE 54
  • 56. Two additional Lasagna processes RWS (Rijkswaterstaat) process WOZ (Waardering Onroerende Zaken) process PAGE 55
  • 57. RWS Process The Dutch national public works department, called Rijkswaterstaat (RWS), has twelve provincial offices. We analyzed the handling of invoices in one of these offices. The office employs about 1,000 civil servants and is primarily responsible for the construction and maintenance of the road and water infrastructure in its province. To perform its functions, the RWS office subcontracts various parties such as road construction companies, cleaning companies, and environmental bureaus. Also, it purchases services and products to support its construction, maintenance, and administrative activities. PAGE 56
  • 58. C-net discovered using heuristic miner PAGE 57
  • 59. Social network constructed based onhandovers of work Each of the 271 nodes corresponds to a civil servant. Two civil servants are connected if one executed an activity causally following an activity executed by the other civil servant PAGE 58
  • 60. Social network consisting of civil servants thatexecuted more than 2000 activities in a 9 month period. The darker arcs indicate the strongest relationships in the social network. Nodes having the same color belong to the same clique. PAGE 59
  • 61. WOZ process Event log containing information about 745 objections against the so-called WOZ (Waardering Onroerende Zaken) valuation. Dutch municipalities need to estimate the value of houses and apartments. The WOZ value is used as a basis for determining the real-estate property tax. The higher the WOZ value, the more tax the owner needs to pay. Therefore, there are many objections (i.e., appeals) of citizens that assert that the WOZ value is too high. WOZ process discovered for another municipality (i.e., different from the one for which we analyzed the WMO process). PAGE 60
  • 62. Discovered process modelThe log contains events related to 745 objections against theso-called WOZ valuation. These 745 objections generated 9583events. There are 13 activities. For 12 of these activities bothstart and complete events are recorded. Hence, the WF-net has PAGE 6125 transitions.
  • 63. Conformance checker:(fitness is 0.98876214) PAGE 62
  • 64. Performance analysis bottleneck detection: places are colored based on average durations time required to move from one activity to another information on total flow time PAGE 63
  • 65. Resource-activity matrix(four groups discovered) clique 1 clique 2 clique 3 clique 4 PAGE 64
  • 66. PAGE 65
  • 67. Example of a Spaghetti processSpaghetti process describing the diagnosis and treatment of 2765patients in a Dutch hospital. The process model was constructedbased on an event log containing 114,592 events. There are 619different activities (taking event types into account) executed by 266different individuals (doctors, nurses, etc.). PAGE 66
  • 68. Fragment18 activities of the 619 activities (2.9%) PAGE 67
  • 69. Another example(event log of Dutch housing agency) The event log contains 208 cases that generated 5987 events. There are 74 different activities. PAGE 68
  • 70. PAGE 69
  • 71. PAGE 70
  • 72. Example of a map Road map of The Netherlands. The map abstracts from smaller cities and less significant roads; only the bigger cities, highways, and other important roads are shown. Moreover, cities aggregate local roads and local districts. Also not use of color, size, etc. PAGE 71
  • 73. Illustrating the problemx start y 1.0 z 1.0 1.0 a f j p3 p9 p1 p12 p7 0.4 0.3 0.4 0.6 0.6 0.4 0.3 0.6 0.4 b c d g h k l0.4 0.3 0.3 p4 0.4 0.6 p10 p2 p8 p5 p11 1.0 e i 1.0 p6 end PAGE 72
  • 74. Classical top level view: low level connections still exist p3 p9 p4 x y z p10 p5 p11x start y 1.0 z 1.0 p6 1.0 a f j p3 p9 p1 p12 p7 0.4 0.3 0.4 0.6 0.6 0.4 0.3 0.6 0.4 b c d g h k l0.4 0.3 0.3 p4 0.4 0.6 p10 p2 p8 p5 p11 1.0 e i 1.0 p6 end PAGE 73
  • 75. Seamless zoomThreshold: 1.0 x y z a f j x y z e iThreshold: 0.6 x y z a f j h k x y z e iThreshold: 0.4 x y z a f j b g h k l x y z e iThreshold: 0.3 x y z a f j b c d g h k l x y z e i PAGE 74
  • 76. Example: Reviewing papers(100 cases generating 3730 events) WF-net discovered using the -algorithm PAGE 75
  • 77. Fuzzy miner: two views on the same process fuzzy model showing fuzzy model all activities showing only two activities color and width of arc indicates significanceof connection PAGE 76
  • 78. Balancing between both extremes fuzzy model showing all activities fuzzy model showing only two activities color and width of arc indicates significanceof connection aggregated node containing 10 activities inner structure of aggregated node PAGE 77
  • 79. Not a single map! PAGE 78
  • 80. Projecting dynamic information onbusiness process maps PAGE 79
  • 81. Projecting traffic jams on maps PAGE 80
  • 82. Business process movies PAGE 81
  • 83. Navigation Whereas a TomTom device is continuously showing the expected arrival time, users of todays information systems are often left clueless about likely outcomes of the cases they are working on. Car navigation systems provide directions and guidance without controlling the driver. The driver is still in control, but, given a goal (e.g. to get from A to B as fast as possible), the navigation system recommends the next action to be taken. Operational support provides TomTom functionality for business processes. PAGE 82
  • 84. Recommend: How to get home ASAP? Take a left turn! Detect: You drive too fast! Predict: When will I be home? At 11.26! PAGE 83
  • 85. Conclusion: two types of processes PAGE 84
  • 86. www.processmining.org www.win.tue.nl/ieeetfpm/ PAGE 85