Predictive Process Monitoring with Hyperparameter Optimization

Post on 20-Feb-2017

269 views 0 download

Transcript of Predictive Process Monitoring with Hyperparameter Optimization

Predictive Process Monitoring Framework with Hyperparameter Optimization

Chiara Di Francescomarino Chiara GhidiniFondazione Bruno Kessler

Marlon Dumas Fabrizio Maria MaggiUniversity of Tartu

Marco FedericiWilliams RizziUniversity of Trento

2

Predictive Business Process Monitoring

Predictive Business Process Monitoring

Historical execution

traces

Running trace

Prediction problem

Prediction

Does Alice need a given exam?

3

Predictive Process Monitoring Frameworks

• Framework instance or configuration: combination of techniques and their input parameters (hyperparameters).

• No unique framework instance for all prediction problems and datasets.

Predictive Process Monitoring Framework

K-means

clustering

DBScan clusteringDecision Tree

Agglomerative

clustering

Frequency-based encoding

Sequence-based

encoding

Voting

Random Forest

• Cluster Number

• Minpoints

• Epsilon

• Voters

• Cluster Number

• Seed

• Seed

Historical execution

traces

Running trace

Prediction problem

Framework Instance

In the “Real” WorldDoes Alice need the exams tumor marker CA- 19.9 or ca -125 using meia?

Which framework instance best suits my dataset and problem?Which one if I would

like to have only accurate predictions?

Predictive Process Monitoring Framework

• Cluster Number• Minpoints

• Epsilon

• Voters

• Cluster Number

• Seed

• Seed

K-means

clusteringDBScan clusteringDecision

Tree

Agglomerative

clustering

Frequency-based encoding

Sequence-based

encoding

Voting

Random Forest

4

5

The Existing Landscape

• Approaches for – the selection of machine learning techniques– the tuning of their hyperparameters – the combined optimization of machine learning

techniques and their hyperparameters• We need to deal with the combination of

more than one machine learning technique, depending one from the other.

Challenge

6

How to Avoid Users’ Panic?

• A Predictive Process Monitoring Framework enhanced with technique and hyperparameter optimization1. An exhaustive exploration of a set of the

framework configurations

2. Comparison and analysis of the results.

How to make

it efficiently?

How to

support users?

7

The Enhanced Framework

Prediction Problem

Predictive Process Monitoring Framework

Historical execution

traces

Running trace

Prediction

Technique and Hyperparameter Tuner

Validation execution

traces

ReplayerEvaluator

Framework InstanceAggregated Metrics

Framework Instance

8

The Predictive Process Monitoring Framework

Pre-processing

Historical execution

traces

Running trace

Runtime

Clustering ClustersControl

flow encoding

Encoded control

flow

CONTROL FLOW

Prefix extraction

Trace Prefixes

Predictive MonitoringControl

flow encoding

Data encoding

Cluster(s) identification

Classification

Prediction Problem

Prediction

Supervised Learning Classifiers

Data encoding

Encoded data

DATALabeling function

9

The Predictive Process Monitoring Framework Instances

• Each technique has its own hyperparameters• Other framework parameters:

– Trace prefix size– Voting mechanism– Interval choice in case of interval time predictions

10

Technique and Hyperparameter Tuning

• A trace is replayed until an evaluation point with a prediction confidence above a given threshold is reached.

• Three metrics/evaluation dimensions:– Accuracy– Failure rate– Earliness

ProM

ProM Operational

Support Service 2.0

Predictive Monitor

Technique and Hyperparameter Tuner

ReplayerValidation execution

traces

Configuration Sender

Evaluator

Framework Instance AggregatedMetrics

Framework Instance

11

Improving Efficiency

• Scheduling mechanism for parallel replayers• Reuse of data structures

ProM

ProM Operational

Support Service 2.0

Predictive Monitor

Technique and Hyperparameter Tuner

Replayer 1

<<GUI>>

Unfolding Module

Configuration Sender

Replayer Scheduler

configuration{Run ID}

<Run ID, Trace>

Replayer 2

Replayer NSCHEDULER

Structured structure

Repository

12

Supporting Users in the Analysis of the Results

13

Evaluation

• A suitable configuration for the prediction problem and dataset in practice1. Does it return a set of configurations suitable for

the prediction problem?2. Does the selected configuration meet the choice

criteria?3. Does it require a reasonable amount of time?

14

Experimental Settings• Two datasets and two prediction problems– BPI Challenge 2011

– BPI Challenge 2015

Dataset preparation:• Training set (70%)• Validation set (20%)• Testing set (10%)

Identification of the most suitable

configurations (among 160)

Evaluation of the identified

configurations (with the testing

set)

15

Configuration Set Variability

• Higher variability for the first dataset → tuning depends on users’ needs

• Lower variability for the second dataset → configurations do not change that much

16

Configuration Selection

• No unique best configuration.• Evaluation values are aligned with the tuning

ones.

17

Computation Time

• Computation time can depend on the trace length.

• Data structure reuse →20% time reduction• 8 replayers → 13% time reduction

18

Summing up & Looking Ahead

• A predictive monitoring framework enhanced with technique and hyperparameter optimization

• Three directions:– Increase user support– Optimize exhaustive search– Prescriptive process monitoring

THANK YOU!!