INISET@CAiSE 2011

24
Faculty of Economics and Business Administration Department of Management Information and Operations Management Jan Claes for INISET@CAiSE 2011 21 June, 2011 FACULTY OF ECONOMICS AND BUSINESS ADMINISTRATION Integrating Computer Log Files for Process Mining A Genetic Algorithm Inspired Technique Jan Claes [email protected] http://processmining.ugent.be Ghent University, Belgium

description

Slides of my presentation at INISET workshop at CAiSE conference, 21 June 2011, London, UK

Transcript of INISET@CAiSE 2011

Page 1: INISET@CAiSE 2011

Faculty of Economics and Business Administration Department of Management Information and Operations Management

Jan Claes for INISET@CAiSE 201121 June, 2011

FACULTY OF ECONOMICS AND BUSINESS ADMINISTRATION

Integrating Computer Log Files for Process Mining

A Genetic Algorithm Inspired Technique

Jan [email protected]://processmining.ugent.beGhent University, Belgium

Page 2: INISET@CAiSE 2011

Faculty of Economics and Business Administration Department of Management Information and Operations Management

Jan Claes for INISET@CAiSE 201121 June, 2011

FACULTY OF ECONOMICS AND BUSINESS ADMINISTRATION

1. Process Mining

Page 3: INISET@CAiSE 2011

Faculty of Economics and Business Administration Department of Management Information and Operations Management

Jan Claes for INISET@CAiSE 20113 / 24

A plane crashed... What happened?

Analyse the ‘black box’

Page 4: INISET@CAiSE 2011

Faculty of Economics and Business Administration Department of Management Information and Operations Management

Jan Claes for INISET@CAiSE 20114 / 24

A process failed... What happened?

Analyse the ‘black box’: look for historical data

Process Mining:

Reconstruct and analyse processes

From historical process data

• Log files

• Audit trails

• Database history fields/tables

Page 5: INISET@CAiSE 2011

Faculty of Economics and Business Administration Department of Management Information and Operations Management

Jan Claes for INISET@CAiSE 20115 / 24

Process Mining

Processes are supported by IT systems

IT systems record actual process data

Process data can be used to automatically

Discover process model

Check conformance with existing process info

Extend existing process model

Attention

Only As-Is

Only (correctly) recorded information

Process Mining

Page 6: INISET@CAiSE 2011

Faculty of Economics and Business Administration Department of Management Information and Operations Management

Jan Claes for INISET@CAiSE 20116 / 24

Preparation

Collect data: find traces

Merge data: from different sources

Structure data: group per instance

Convert data: to tool specific format

Process mining

Make decisions, take action

Process Mining steps

Page 7: INISET@CAiSE 2011

Faculty of Economics and Business Administration Department of Management Information and Operations Management

Jan Claes for INISET@CAiSE 20117 / 24

Process Mining steps

Page 8: INISET@CAiSE 2011

Faculty of Economics and Business Administration Department of Management Information and Operations Management

Jan Claes for INISET@CAiSE 201121 June, 2011

FACULTY OF ECONOMICS AND BUSINESS ADMINISTRATION

2. Merging log files

Page 9: INISET@CAiSE 2011

Faculty of Economics and Business Administration Department of Management Information and Operations Management

Jan Claes for INISET@CAiSE 20119 / 24

Example

Product ordering: registered events:

Sales order: document creation (administration)

Delivery: truck load confirmation (warehouse)

Invoice: document creation (administration)

Logging

from administration software

from warehouse software

How to merge both log files?

Page 10: INISET@CAiSE 2011

Faculty of Economics and Business Administration Department of Management Information and Operations Management

Jan Claes for INISET@CAiSE 201110 / 24

Example 1

Administration Warehouse

Merge based on matching trace identifiers

SO > Inv

SO > Inv

SO > Inv

SO1

SO2

SO3

Deliver

Deliver

Deliver

SO1

SO2

SO3

SO > Deliver > Inv

SO > Deliver > Inv

SO > Deliver > Inv

SO1

SO2

SO3

Page 11: INISET@CAiSE 2011

Faculty of Economics and Business Administration Department of Management Information and Operations Management

Jan Claes for INISET@CAiSE 201111 / 24

Example 2

Administration Warehouse

Merge based on matching attribute values

SO > Inv

SO > Inv

SO > Inv

SO1

SO2

SO3

DeliverDel1

Del2

Del3

SO > Deliver > Inv

SO > Deliver > Inv

SO > Deliver > Inv

SO1

SO2

SO3

(SO1)

Deliver (SO2)

Deliver (SO3)

Page 12: INISET@CAiSE 2011

Faculty of Economics and Business Administration Department of Management Information and Operations Management

Jan Claes for INISET@CAiSE 201112 / 24

t1<t2<t3

<<t4<t5<t6

<<t7<t8<t9

Example 3

Administration Warehouse

Merge based on time information

SO > Inv

SO > Inv

SO > Inv

SO1

SO2

SO3

DeliverArr1

Arr2

Arr3

SO > Deliver > Inv

SO > Deliver > Inv

SO > Deliver > Inv

SO1

SO2

SO3

Deliver

Deliver

t1 t3

t4t6

t7 t9

t2

t5

t8

Page 13: INISET@CAiSE 2011

Faculty of Economics and Business Administration Department of Management Information and Operations Management

Jan Claes for INISET@CAiSE 201113 / 24

Merging computer log files

Merge based on

Example 1: matching trace identifiers indicator 1

Example 2: matching attribute values indicator 2

Example 3: time information indicator 3

General solution algorithm combining different indicators

Genetic algorithm indicators build up fitness function

Page 14: INISET@CAiSE 2011

Faculty of Economics and Business Administration Department of Management Information and Operations Management

Jan Claes for INISET@CAiSE 201121 June, 2011

FACULTY OF ECONOMICS AND BUSINESS ADMINISTRATION

3. Genetic algorithm

Page 15: INISET@CAiSE 2011

Faculty of Economics and Business Administration Department of Management Information and Operations Management

Jan Claes for INISET@CAiSE 201115 / 24

Genetic algorithm

1st generation 2nd generation 3th generation

cross-over

mutation

survival of the fittest

Page 16: INISET@CAiSE 2011

Faculty of Economics and Business Administration Department of Management Information and Operations Management

Jan Claes for INISET@CAiSE 201116 / 24

Genetic algorithm

1st generation 2nd generation 3th generation

mutation

cross-over

survival of the fittest

14

27

6

18

29

5

18

28

32

Fitness function score

Page 17: INISET@CAiSE 2011

Faculty of Economics and Business Administration Department of Management Information and Operations Management

Jan Claes for INISET@CAiSE 201117 / 24

Genetic algorithm inspired technique

Find links between traces of both log files and merge them chronologically in new log file

Steps

Make initial solution (best individual links)

Make pseudo-random changes (try to improve score for one specific factor)

Evaluate (keep original or changed solution)

Stop condition (fixed amount of steps)

Only one solution, no cross-over

Page 18: INISET@CAiSE 2011

Faculty of Economics and Business Administration Department of Management Information and Operations Management

Jan Claes for INISET@CAiSE 201121 June, 2011

FACULTY OF ECONOMICS AND BUSINESS ADMINISTRATION

4. Experiment results

Page 19: INISET@CAiSE 2011

Faculty of Economics and Business Administration Department of Management Information and Operations Management

Jan Claes for INISET@CAiSE 201119 / 24

Experiment: proof of concept

Simulated data

Given model

Generate

• random set of logs

• single log (=solution)

Use merge algorithm to merge set of logs

Check resulting log with solution log

Page 20: INISET@CAiSE 2011

Faculty of Economics and Business Administration Department of Management Information and Operations Management

Jan Claes for INISET@CAiSE 201120 / 24

Experiment: proof of concept

Advantages of using simulated data

Solution is known

Controllable parameters (e.g. noise, overlap, matching id)

Disadvantages of using simulated data

Limited internal validity (are results realistic?)

No external validity (results not generalisable)

Page 21: INISET@CAiSE 2011

Faculty of Economics and Business Administration Department of Management Information and Operations Management

Jan Claes for INISET@CAiSE 201121 / 24

Experiment results

Incorrect links related to total links identified

Page 22: INISET@CAiSE 2011

Faculty of Economics and Business Administration Department of Management Information and Operations Management

Jan Claes for INISET@CAiSE 201121 June, 2011

FACULTY OF ECONOMICS AND BUSINESS ADMINISTRATION

5. Discussion

Page 23: INISET@CAiSE 2011

Faculty of Economics and Business Administration Department of Management Information and Operations Management

Jan Claes for INISET@CAiSE 201123 / 24

Future work

Optimise genetic algorithm

Less incorrect links

Faster implementation (AIS algorithm)

Fitness function factors

Validation with real test cases

Ghent University DPO (Human Resources)

Century21 (Real Estate) & FlexPack (Packaging)

BNP Paribas Fortis (Finance)

...

Page 24: INISET@CAiSE 2011

Faculty of Economics and Business Administration Department of Management Information and Operations Management

Jan Claes for INISET@CAiSE 201124 / 24

Contact information

Jan [email protected]

http://processmining.ugent.beTwitter: @janclaesbelgium