Business Intelligence and Data Mining in Banking · 2009-01-14 · to data to help enterprise users...

107
©2008 Gholamreza Nakhaeizadeh. All rights reserved Professor Dr. Gholamreza Nakhaeizadeh Business Intelligence and Data Mining in Banking An Experience Report

Transcript of Business Intelligence and Data Mining in Banking · 2009-01-14 · to data to help enterprise users...

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Professor Dr. Gholamreza NakhaeizadehProfessor Dr. Gholamreza Nakhaeizadeh

Business Intelligence and Data Mining in Banking

An Experience Report

Business Intelligence and Data Mining in Banking

An Experience Report

2

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Content

Part twoPart twoApplication of Data Mining in Banking

• General Aspects

• Application of Data Mining in:

- Fraud Detection- Anti Money Laundering - Financial risk management- Customer Relationship Management

Success Factors of Data Mining Projects

Application of Data Mining in Banking

• General Aspects

• Application of Data Mining in:

- Fraud Detection- Anti Money Laundering - Financial risk management- Customer Relationship Management

Success Factors of Data Mining Projects

• An introduction on BI

• Relation between BI and Data Mining

• Why data Mining ?

• What is Data Mining ?

• Data Mining Process

• Data Mining Algorithms

• An introduction on BI

• Relation between BI and Data Mining

• Why data Mining ?

• What is Data Mining ?

• Data Mining Process

• Data Mining Algorithms

Part OnePart One

3

©2008 Gholamreza Nakhaeizadeh. All rights reserved

What is BI ?

- BI is a broad category of Management Information Systems, applications and technologies for gathering, storing, analyzing, and providing access to data to help enterprise users make better Business. decisions.

…………

- An effective BI system provides corporations with “one version of the truth”.

…………………………………………….

- BI is a broad category of Management Information Systems, applications and technologies for gathering, storing, analyzing, and providing access to data to help enterprise users make better Business. decisions.

…………

- An effective BI system provides corporations with “one version of the truth”.

…………………………………………….

Source: http://www.managementlogs.com/2004/09/what-is-bi-really-multiple-versions-of.html

Some definitions (theoretical point of view)Some definitions (theoretical point of view)

One version of the truth

One version of the truth

4

©2008 Gholamreza Nakhaeizadeh. All rights reserved

What is BI ?

Business Intelligence systems are data-driven DSS.

Business Intelligence systems are data-driven DSS.

de facto : Today BI-Platforms are just Reporting Systems

Often based on OLAP-Tools

de facto : Today BI-Platforms are just Reporting Systems

Often based on OLAP-Tools

(In the praxis)

What about the Intelligence ?

5

©2008 Gholamreza Nakhaeizadeh. All rights reserved

BI-History (1)

Business intelligence was defined in an October 1958 IBM Journal article by Hans Peter LuhnLuhn wrote:

Business intelligence was defined in an October 1958 IBM Journal article by Hans Peter LuhnLuhn wrote:

In this paper, business is a collection of activities carried on for whatever purpose, be it science, technology, commerce, industry, law, government, defense, et cetera.

In this paper, business is a collection of activities carried on for whatever purpose, be it science, technology, commerce, industry, law, government, defense, et cetera.

The communication facility serving the conduct of a business (in the broad sense) may be referred to as an intelligence system.

The communication facility serving the conduct of a business (in the broad sense) may be referred to as an intelligence system.

The notion of intelligence is also defined here, in a more general sense, as "the ability to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal."

The notion of intelligence is also defined here, in a more general sense, as "the ability to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal."http://en.wikipedia.org/wiki/Business_intelligence

6

©2008 Gholamreza Nakhaeizadeh. All rights reserved

BI-History (2)

Howard Dresner

The term "business intelligence" was founded in 1989 by Howard Dresner an analyst of Gartner Group. He later created the secondary term business performance management

http://en.wikipedia.org/wiki/Business_intelligence

7

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Main Components of a BI-System

Operating System

Operating System

Flat files

Stagingarea

Stagingarea

Extraction Tools

Extraction Tools

Extraction Tools

Data TransformationData Cleaning

ArchitectureArchitecture

Data Warehouse

Loading Tools

ETL: Extraction, Transformation, LoadingETL: Extraction, Transformation, Loading

Data Mining Tools

Data Mining as an important process of a BI SystemData Mining as an important process of a BI System

Reporting OLAP Tools

8

©2008 Gholamreza Nakhaeizadeh. All rights reservedWhy Data Mining ?own experience

In the Automotive Industry

Why Data Mining ?own experience

In the Automotive Industry

9

©2008 Gholamreza Nakhaeizadeh. All rights reserved

e.g. Supplier datae.g. Supplier data

Automotive Industry is a „data reach“ Industry

picocool.com/go/news/post/car-parts/

www.kautex-group.com/.../automotive.html

saberan.com/product.htm

11

©2008 Gholamreza Nakhaeizadeh. All rights reserved

e.g. Customer datae.g. Customer data

Automotive Industry is a „data reach“ Industry

12

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Finance dataFinance data

Automotive Industry is a „data reach“ Industry

13

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Focus on data reach main business

Customer RelationshipManagement

QualityManagement

Financial RiskManagement

Service QualityData Quality

Service QualityData Quality

14

©2008 Gholamreza Nakhaeizadeh. All rights reserved

What is Data Mining ?

One of the most used definition (Fayyad et al 1996):

Knowledge Discovery in Databases (KDD) is a process that aims at finding valid, useful, novel and understandable patterns in data

One of the most used definition (Fayyad et al 1996):

Knowledge Discovery in Databases (KDD) is a process that aims at finding valid, useful, novel and understandable patterns in data

Understandable pattern: RulesNon-understandable: Trained artificial neural networks (ANN)

Understandable pattern: RulesNon-understandable: Trained artificial neural networks (ANN)

KDD and Data Mining:

KDD comes originally from AI

Data Mining is a part of KDD

In the praxis KDD and Data Mining are used as synonyms

KDD and Data Mining:

KDD comes originally from AI

Data Mining is a part of KDD

In the praxis KDD and Data Mining are used as synonyms

15

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Simple fictive example; credit risk

Income Car Gender Credit risk

Customer 1 low new F bad

Customer 2 middle old F bad

Customer 3 middle new M good

Customer 4 low new M bad

Customer 5 high new M good

Customer 6 high new F good

Customer 7 middle new F good

Customer 8 high old F good

Customer 9 middle old M bad

Customer 10 low old F bad

16

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Income Car Gender Credit risk

Customer 1 low new F bad

Customer 2 middle old F bad

Customer 3 middle new M good

Customer 4 low new M bad

Customer 5 high new M good

Customer 6 high new F good

Customer 7 middle new F good

Customer 8 high old F good

Customer 9 middle old M bad

Customer 10 low old F bad

Simple fictive example; credit risk

17

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Income Car Gender Credit risk

Customer 1 low new F bad

Customer 2 middle old F bad

Customer 3 middle new M good

Customer 4 low new M bad

Customer 5 high new M good

Customer 6 high new F good

Customer 7 middle new F good

Customer 8 high old F good

Customer 9 middle old M bad

Customer 10 low old F bad

Simple fictive example; credit risk

18

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Income Car Gender Credit risk

Customer 1 old new F bad

Customer 2 middle old F bad

Customer 3 middle new M good

Customer 4 old new M bad

Customer 5 high new M good

Customer 6 high new F good

Customer 7 middle new F good

Customer 8 high old F good

Customer 9 middle old M bad

Customer 10 old old F bad

Simple fictive example; credit risk

19

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Income Car Gender Credit risk

Customer 1 low new F bad

Customer 2 middle old F bad

Customer 3 middle new M good

Customer 4 low new M bad

Customer 5 high new M good

Customer 6 high new F good

Customer 7 middle new F good

Customer 8 high old F good

Customer 9 middle old M bad

Customer 10 low old F bad

Simple fictive example; credit risk

20

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Income Car Gender Credit risk

Customer 1 low new F bad

Customer 2 middle old F bad

Customer 3 middle new M good

Customer 4 low new M bad

Customer 5 high new M good

Customer 6 high new F good

Customer 7 middle new F good

Customer 8 high old F good

Customer 9 middle old M bad

Customer 10 low old F bad

Simple fictive example; credit risk

21

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Income Car Gender Credit risk

Customer 1 low new F bad

Customer 2 middle old F bad

Customer 3 middle new M good

Customer 4 low new M bad

Customer 5 high new M good

Customer 6 high new F good

Customer 7 middle new F good

Customer 8 high old F good

Customer 9 middle old M bad

Customer 10low old F bad

If income= high Credit risk=goodIf income= low Credit risk=bad

If income= middle & Car=new Credit risk=good

If income= middle & Car=old Credit risk=bad

ClassifierClassifier

•Credit risk a new customer with high income = good•Credit risk a new customer who has old car

and middle income = bad• .....

•Credit risk a new customer with high income = good•Credit risk a new customer who has old car

and middle income = bad• .....

Credit risk of new CustomersCredit risk of new Customers

This classifier can be regarded as anInductive expert systems

This classifier can be regarded as anInductive expert systems

Simple fictive example; credit risk

22

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Demo : Construction of a “Credit risk Miner”RapidMiner

Open in Workspace: CreditToy.xml

and German_CreditPredict

23

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Statistics

Database Technology

Data Mining

Interdisciplinary aspects of Data Mining

AI (Machine Learning)

Visualization

Privacy

24

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Examples of Data Mining Tools (commercial)

SAS Enterprise MinerSAS Enterprise Miner

Statistica Data MinerStatistica Data MinerSPSS ClementineSPSS Clementine

CARTCART

25

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Examples of Data Mining Tools (Public Domain)

(open source)Ian witten, Frank Eibe: Data Mining: Practical Machine Learning Tools and Techniques (Second Edition)

RapidMiner

26

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Free/Open Source Data Mining Software

Free/Open Source Data Mining SoftwareCommercial SoftwareCommercial Software

What data mining tools have you used for a real project (not just for evaluation) in the past 6 months?

Poll: Data Mining SoftwarePoll: Data Mining Software

• SPSS Clementine ( 74, 53 alone or with SPSS)• Excel (61, 1 alone) • SAS (55, 6 alone or with SAS EM) • KXEN (32, 25 alone) • SAS Enterprise Miner (24, 6 alone or with SAS) • MATLAB (22,1 alone) • SQL Server (20, 2 alone) • Other commercial tools (12)•…….

• Orange (3) • C4.5/C5.0 (8)

• Other free tools (18)• KNIME (30, 14 alone)

• Weka (36, 4 alone)

• R (39, 4 alone)

• RapidMiner (72, 49 alone)

27

©2008 Gholamreza Nakhaeizadeh. All rights reserved

KDD-89: IJCAI-89 workshop on Knowledge Discovery in Databases August 20, 1989, Detroit MI, USA

Dr. Gregory Piatetsky-Shapiro,

History of Data Mining: Data Mining rapid development

Results 1 - 10 of about 15,300,000 for "data mining" [definition]. (0.21 seconds)

Results 1 - 10 of about 15,300,000 for "data mining" [definition]. (0.21 seconds)

29

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Data Mining rapid development

30

©2008 Gholamreza Nakhaeizadeh. All rights reserved

• StatLog• CRISP-DM• INRECA• MetaL• READ• Data Mining

Grid

Some European funded Projects

31

©2008 Gholamreza Nakhaeizadeh. All rights reserved

• KDD • PKDD-ECML• SIAM-Data Mining• ICDM, • PAKDD• ICML•……

• KDD • PKDD-ECML• SIAM-Data Mining• ICDM, • PAKDD• ICML•……

• ACM Transactions on KDD (New)• IEEE Transactions On Knowledge and Data Engineering• KDD Explorations • Data Mining and Knowledge Discovery • Machine Learning •…

• ACM Transactions on KDD (New)• IEEE Transactions On Knowledge and Data Engineering• KDD Explorations • Data Mining and Knowledge Discovery • Machine Learning •…

ConferencesConferences

JournalsJournals

32

©2008 Gholamreza Nakhaeizadeh. All rights reserved

DataUnderstanding

DataPreparation

Modelling

BusinessUnderstanding

Deployment

Evaluation

CRISP-DM :

- Provides an overview of the life cycle of a data mining project

- Consists of six phases

- was partially funded by the EuropeanCommission

Data Mining Process

Project Partner:

- CRISP-DM Process Model is described in: http://www.crisp-dm.org/CRISPwP-0800.pdf

33

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Demo process support and data sources

RapidMiner

34

©2008 Gholamreza Nakhaeizadeh. All rights reserved

CRISP-DM: Modeling CRISP-DM: Modeling

Data Mining Process

Task IdentificationTask Identification

ClassificationCredit Risk• Good customer• Bad customer

Prediction

Concept DescriptionCustomers Loyalty :• Age• Income• Education•….

Dependency Analysis

A and B C

Clustering

Deviation detectionBusiness ChallengeBusiness Challenge

35

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Supervised and unsupervised learning

Observations

Attributes Target variable

(Tuples)

36

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Supervised Learning

Examples for Supervised Learning : Classification, PredictionExamples for Supervised Learning : Classification, Prediction

1

2

3......m

a11

a21

a31

am1

a1n

a2n

a3n

amn

a12

a22

a32

am2

a13

a23

a33

am3

.. …. …. …. ….. ….

t1

t2

t3

tm

Nr. A1 A2 A3……… An T

37

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Unsupervised Learning

1

2

3......m

a11

a21

a31

am1

a1n

a2n

a3n

amn

a12

a22

a32

am2

a13

a23

a33

am3

.. …. …. …. ….. ….

t1

t2

t3

tm

Nr. A1 A2 A3……… An T

Example for Unsupervised Learning: ClusteringExample for Unsupervised Learning: Clustering

38

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Supervised and Unsupervised Learning

Income Car Gender Credit risk

Customer 1 low new F bad

Customer 2 middle old F bad

Customer 3 middle new M good

Customer 4 low new M bad

Customer 5 high new M good

Customer 6 high new F good

Customer 7 middle new F good

Customer 8 high old F good

Customer 9 middle old M bad

Customer 10low old F bad

Supervised LearningSupervised Learning

Income Car Gender Credit risk

Customer 1 low new F bad

Customer 2 middle old F bad

Customer 3 middle new M good

Customer 4 low new M bad

Customer 5 high new M good

Customer 6 high new F good

Customer 7 middle new F good

Customer 8 high old F good

Customer 9 middle old M bad

Customer 10low old F bad

Unsupervised LearningUnsupervised Learning

39

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Data Mining Algorithms

Data Mining algorithmsData Mining algorithms

Machine Learning

Rule Based Induction

Decision Trees

Neural Networks

Conceptional clustering…….

Statistics

Discriminant Analysis

Cluster Analysis

Regression Analysis

Logistic RegressionAnalysis

…….

Database Technology

Association Rules….

40

©2008 Gholamreza Nakhaeizadeh. All rights reserved

CRISP-DM: Data PreparationCRISP-DM: Data Preparation

Data Mining Process

Observation Reduction- Sampling- Intelligent Sampling- Learn to forget…….

Observation Reduction- Sampling- Intelligent Sampling- Learn to forget…….

Observations

Attributes

12345678

1 2 3 4 5

Observations

Attributes

12345678

1 2 3 4 5

Data SelectingData Selecting

Attribute ReductionAttribute Reduction

41

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Demos Data PreprocessingRapidMiner

Sampling:RapidMiner: German_CreditTr.amlUse: sampling in preprocessing

Sampling:RapidMiner: German_CreditTr.amlUse: sampling in preprocessing

42

©2008 Gholamreza Nakhaeizadeh. All rights reserved

CRISP-DM: Data PreparationCRISP-DM: Data Preparation

Data Mining Process

Dealing with :

Missing Values- Ignore the observation- Using the attribute mean- Predict the missing value

- Decision tree- Regression- ……..

Inaccurate data- Using Background Knowledge (Rules)

Duplicates- Straße , Strasse, Str. Robert X, Bob X- Professor, Prof. Dr.

Dealing with :

Missing Values- Ignore the observation- Using the attribute mean- Predict the missing value

- Decision tree- Regression- ……..

Inaccurate data- Using Background Knowledge (Rules)

Duplicates- Straße , Strasse, Str. Robert X, Bob X- Professor, Prof. Dr.

84747218471ß223

673374726

462156675

76320852876664638474

7218

471ß

223

673374726

462156675

7632085287666463

84747218471ß2236733

7472

6

4621

5667

5

7632

0852

8766

6463

673374726

462156675

7632085287666463

640928649427737062849

918227365410285396

Data CleaningData Cleaning

43

©2008 Gholamreza Nakhaeizadeh. All rights reserved

CRISP-DM: Data PreparationCRISP-DM: Data Preparation

Data Mining Process

Dealing with Outliers

- Outlier as noise- Outlier detection as interestingfinding

- Outliers Analysis Methods- Model-based outlier detection- Using distance measures- Density-Based local Outlier Detection

84747218471ß223

673374726

462156675

76320852876664638474

7218

471ß

223

673374726

462156675

7632085287666463

84747218471ß2236733

7472

6

4621

5667

5

7632

0852

8766

6463

673374726

462156675

7632085287666463

640928649427737062849

918227365410285396

Data CleaningData Cleaning

44

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Demo: Outlier detection and elimination

1. Demo : Sample Preprocessing Outlier

RapidMinerRapidMiner

45

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Content

Part twoPart twoApplication of Data Mining in Banking

• General Aspects

• Application of Data Mining in:- Fraud Detection- Anti Money Laundering - Financial risk management- Customer Relationship Management

Success Factors of Data Mining Projects

Application of Data Mining in Banking

• General Aspects

• Application of Data Mining in:- Fraud Detection- Anti Money Laundering - Financial risk management- Customer Relationship Management

Success Factors of Data Mining Projects

46

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Why Data Mining in Banking ?

• Credit Risk

• Market Risk

• Controlling

• Trading

• Portfolio Manag.

• Investm. Manag.

• CRM

•Regulations&Compliance

•….

• Credit Risk

• Market Risk

• Controlling

• Trading

• Portfolio Manag.

• Investm. Manag.

• CRM

•Regulations&Compliance

•….

Business IssuesBusiness Issues

Customer Data

Portfolio Data

Interest Rate Data

Regulation Data

Currency Data

………

DataData• Fraud Detection

• Anti Money Laundering

• Cross-Selling

• Up-Selling

• Churn Management

• Market Forecasting

• ….

• Fraud Detection

• Anti Money Laundering

• Cross-Selling

• Up-Selling

• Churn Management

• Market Forecasting

• ….

DM- ApplicationsDM- Applications

47

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Business Issues: Financial Risk Assessment

Financial Risk Assessment

Financial Risk Assessment

Market Risk AssessmentMarket Risk Assessment Credit Risk AssessmentCredit Risk Assessment

Country Risk AssessmentCountry Risk Assessment Liquidity Risk AssessmentLiquidity Risk Assessment

…………….…………….

Commercial RiskCommercial Risk

Low Risk High Risk Country Risk

Banking SectorRisk

B CCC

Fraud DetectionAnti Money Laundering

Fraud DetectionAnti Money Laundering

48

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Financial Risk Management

Application of Data Mining

inFraud Detection

Application of Data Mining

inFraud Detection

49

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Fraud Detection, General Aspects

• Bribery • Embezzlement • Fraud • Extortion• Favouritism • Nepotism

• Bribery • Embezzlement • Fraud • Extortion• Favouritism • Nepotism

CorruptionCorruption

is a criminal deception or the use of false representations to gain an unjust advantage. It covers both bribery and embezzlement

is a criminal deception or the use of false representations to gain an unjust advantage. It covers both bribery and embezzlement

Fight Against FraudFight Against Fraud

PreventionPrevention DetectionDetection• Can’t be perfect• Inconvenient • Expensive

• Can’t be perfect• Inconvenient • Expensive

Identifying fraud as soon as it occurred Identifying fraud as soon as it occurred

50

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Fraud Detection, Genral Aspects

Types of FraudTypes of Fraud

• Credit Card Fraud• Money Laundering• Insurance Fraud• Telecommunication Fraud• Computer intrusion•……..

• Credit Card Fraud• Money Laundering• Insurance Fraud• Telecommunication Fraud• Computer intrusion•……..

Data Mining can help

detection

Important: Fraud Detection is a continual developing process,because patterns of fraud are dynamic and change over the timeImportant: Fraud Detection is a continual developing process,because patterns of fraud are dynamic and change over the time

• Internal Fraud• External Fraud• Internal Fraud• External Fraud

Fraud Detection systems „are used to catch bad guys doing bad things“Fraud Detection systems „are used to catch bad guys doing bad things“

51

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Fraud Detection, why Data Mining ?

Why data mining is needed in Fraud Detection ?Why data mining is needed in Fraud Detection ?

Huge volume of Data; example:

• Over 1.59 Mrd. Visa cards in circulation

• 6800 transactions per second (peaks)

• 20000 members banks

• Millions of merchants (Source: http://www.rgrossman.com/talks/grossman-iciq-07-v4.pdf)

Huge volume of Data; example:

• Over 1.59 Mrd. Visa cards in circulation

• 6800 transactions per second (peaks)

• 20000 members banks

• Millions of merchants (Source: http://www.rgrossman.com/talks/grossman-iciq-07-v4.pdf)

• Performance Challenge

• Storage Challenge

• Performance Challenge

• Storage Challenge

Fast and efficient algorithms

Modern databases technology

Fast and efficient algorithms

Modern databases technology

Data Mining can help

52

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Fraud Detection, Importance

• A recent survey by KPMG Peat Marwick found that nearly 60 percent of all small business owners reported that their companies have experienced some type of internal financial fraud within their own Employee.

• More than 75 percent of companies surveyed had actually been the victim of employee fraud within the previous 12-month perio

Source: http://www.nfib.com/object/2991852.html

• A recent survey by KPMG Peat Marwick found that nearly 60 percent of all small business owners reported that their companies have experienced some type of internal financial fraud within their own Employee.

• More than 75 percent of companies surveyed had actually been the victim of employee fraud within the previous 12-month perio

Source: http://www.nfib.com/object/2991852.html

Extent of internal fraudExtent of internal fraud

53

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Fraud Detection, Credit Card FraudExtentExtent

"Credit card fraud costs the industry about a billion dollars a year, or 7 cents out of every $100 spent on plastic. But that is down significantly from its peak about a decade ago, Sorrentino says, in large part because of powerful technology that can recognize unusual spending patterns."

"Credit card fraud costs the industry about a billion dollars a year, or 7 cents out of every $100 spent on plastic. But that is down significantly from its peak about a decade ago, Sorrentino says, in large part because of powerful technology that can recognize unusual spending patterns."

21. July 2002

54

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Fraud Detection, IT Impacts

Examples:

• Generating of bogus invoices and paying them to bogus companies

• Most large organizations swap millions into overnight instruments to take advantage of the best interest rates only to swap them back into their working accounts during the day. Skimming a piece of that transaction could be simple.

Examples:

• Generating of bogus invoices and paying them to bogus companies

• Most large organizations swap millions into overnight instruments to take advantage of the best interest rates only to swap them back into their working accounts during the day. Skimming a piece of that transaction could be simple.

http://blogs.zdnet.com/threatchaos/?p=341

• Internal fraud is as old as business

• Internal fraud coupled with IT-savvy is a killer combination

• Since the introduction of the first commercial computer (UNIVAC, on this date in 1951) computers have been used to make the fraudster’s job easier

• Internal fraud is as old as business

• Internal fraud coupled with IT-savvy is a killer combination

• Since the introduction of the first commercial computer (UNIVAC, on this date in 1951) computers have been used to make the fraudster’s job easier

Impact of IT on Fraud perpetrationImpact of IT on Fraud perpetration

55

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Fraud Detection, Data Mining Methods

There are a lot of data mining methods can be used :Common Characteristic of Data Mining Models used in FD

There are a lot of data mining methods can be used :Common Characteristic of Data Mining Models used in FD

Expected values can be:

• Numerical summaries of some aspect of behavior

• Simple graphical summaries showing

• Multivariate behavior profiles based on past behaviorExample: the way of an account has been used in the past

Expected values can be:

• Numerical summaries of some aspect of behavior

• Simple graphical summaries showing

• Multivariate behavior profiles based on past behaviorExample: the way of an account has been used in the past

*

* based on : http://metalab.uniten.edu.my/~abdrahim/ntl/Statistical%20Fraud%20Detection%20A%20Review.pdf

They are based on Comparing the observed Data with their

expected values

They are based on Comparing the observed Data with their

expected values

56

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Fraud Detection, Data Mining MethodsClassification of

the methodsClassification of

the methods

Based on Unsupervised Learning

Based on Unsupervised Learning

Based on Supervised Learning

Based on Supervised Learning

Outlier DetectionOutlier Detection

Alerting to the fact that an Observation is anomalous;more likely to be fraudulent

Alerting to the fact that an Observation is anomalous;more likely to be fraudulent

Suspicion ScoreSuspicion Score

• Modeling of a distribution normal behavior

• Detection of observations with greatest deviation from this norm

• Modeling of a distribution normal behavior

• Detection of observations with greatest deviation from this norm

57

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Fraud Detection, Suspicion Scores

observation ordered Suspicion score

o1

o2

o3

…..…..

s1

s2

s3

…..…..

Regarding analyzing cost,more attention should be paid to observations with highest scores

Regarding analyzing cost,more attention should be paid to observations with highest scores

• Compromise between the cost of detecting and saving reached

• Problematic of fraud publicity

• Damaging the customer relationin the case of false positive

• Compromise between the cost of detecting and saving reached

• Problematic of fraud publicity

• Damaging the customer relationin the case of false positive

Difficult to find case studiestogether with the used data

Difficult to find case studiestogether with the used data

58

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Example

Source: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data miningPages: 432 - 437

59

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Example, observation the expenditure and number of transactions

t t+n time

Change the behavior in the group

Change the behavior in the group

Change the behavior of a unique observation

Change the behavior of a unique observation

60

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Case Study , REVI-MINER

In order to get the refund of vehicle repair costs, workshops of DaimlerChrysler AG worldwideregularly submit the warranty and goodwill cost statements to the central warranty department

in Germany. These statements should be examined for validity and correctness, which is a very complex task for the warranty cost controllers.

REVI-MINER is a KDD-environment which supports the detection and analysis of deviations in warranty and goodwill cost statements. The system is developed within a cooperation between DaimlerChrysler Research & Technology and the direction Global Service and Parts (GSP) and is based upon the CRISP-DM methodology as a widely accepted process model for the solution of Data Mining problems.

Furthermore we have implemented different approaches based on Machine Learning and Statistics that can be used for data cleaning in the preprocessing phase. The applied Data Mining models are developed by using a statistical deviation detection approach. The tool supports the controller within his task to audit the authorized workshops.

In order to get the refund of vehicle repair costs, workshops of DaimlerChrysler AG worldwideregularly submit the warranty and goodwill cost statements to the central warranty department

in Germany. These statements should be examined for validity and correctness, which is a very complex task for the warranty cost controllers.

REVI-MINER is a KDD-environment which supports the detection and analysis of deviations in warranty and goodwill cost statements. The system is developed within a cooperation between DaimlerChrysler Research & Technology and the direction Global Service and Parts (GSP) and is based upon the CRISP-DM methodology as a widely accepted process model for the solution of Data Mining problems.

Furthermore we have implemented different approaches based on Machine Learning and Statistics that can be used for data cleaning in the preprocessing phase. The applied Data Mining models are developed by using a statistical deviation detection approach. The tool supports the controller within his task to audit the authorized workshops.

REVI-MINER, a KDD Environment for Deviation Detection and Analysis of Warranty and Goodwill Cost Statements

in the Automotive IndustryE. Hotz, W. Heuser, U. Grimmer,G. Nakhaeizadeh, M. Wieczorek

REVI-MINER, a KDD Environment for Deviation Detection and Analysis of Warranty and Goodwill Cost Statements

in the Automotive IndustryE. Hotz, W. Heuser, U. Grimmer,G. Nakhaeizadeh, M. Wieczorek

AbstractAbstract

Source: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data miningPages: 432 - 437

61

©2008 Gholamreza Nakhaeizadeh. All rights reserved

REVI-MINEROwn Experience

Refunding of vehicle repair costs

workshops worldwide regularly submit the warranty and goodwill cost statements to the central warranty department in Germany

These statements should be examined for validity and correctness

This is a very complex task for the warranty cost controllers

Refunding of vehicle repair costs

workshops worldwide regularly submit the warranty and goodwill cost statements to the central warranty department in Germany

These statements should be examined for validity and correctness

This is a very complex task for the warranty cost controllers

Business UnderstandingBusiness Understanding

increasing complexity of the product structure:

• different vehicle business divisions (passenger cars, trucks, transporters, busses, …• about 150 vehicle series with several body versions and combustion types• more than twenty production plants

different warranty and goodwill policy for different sales markets and repairareas

increasing complexity of the product structure:

• different vehicle business divisions (passenger cars, trucks, transporters, busses, …• about 150 vehicle series with several body versions and combustion types• more than twenty production plants

different warranty and goodwill policy for different sales markets and repairareas

Problem complexityProblem complexity

62

©2008 Gholamreza Nakhaeizadeh. All rights reserved

REVI-MINER

The old Audit System was a standard system and had the following shortcomings

• Inflexible, not very purposeful , time-consuming• The report generated by the system was a very complicated hardcopytable which had to be processed with difficulty manually.

The old Audit System was a standard system and had the following shortcomings

• Inflexible, not very purposeful , time-consuming• The report generated by the system was a very complicated hardcopytable which had to be processed with difficulty manually.

Old Audit SystemOld Audit System

periodic auditing of workshops within shortening time intervals

fast detection of possibly available abnormalities in the warranty cost statements, analyzingtheir trend and determining which workshop is responsible for these trends

avoidance of false alarms by indicating fraudulent activities that really justify the controlling ofthe workshops

choice from a wide range of parameters while initiating an audit report

visualization of the results

periodic auditing of workshops within shortening time intervals

fast detection of possibly available abnormalities in the warranty cost statements, analyzingtheir trend and determining which workshop is responsible for these trends

avoidance of false alarms by indicating fraudulent activities that really justify the controlling ofthe workshops

choice from a wide range of parameters while initiating an audit report

visualization of the results

Business goal: Developing an audit system allows for:Business goal: Developing an audit system allows for:

63

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Case Study , REVI-MINERData understandingData understandingThe available historical data about warranty and goodwill costs is a part of the database QUIS (QUality Information System) that can be considered as a kind of data warehouse containing information on producedvehicles and their repairs

The available historical data about warranty and goodwill costs is a part of the database QUIS (QUality Information System) that can be considered as a kind of data warehouse containing information on producedvehicles and their repairs

repair workshopsproduction plant

vehicle data

warranty and good-will claims

VEGAclaim processing

technical data

warranty claims data

commercial data

generalvehicledata

QUIS

partstestingdata

64

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Case Study , REVI-MINERData PreparationData Preparation

query 1: general vehicle data

• VIN (vehicle ID number)• date of production• motor type• continent• country⇒ new vehicle series⇒ new motor types for existing vehicle series

query 1: general vehicle data

• VIN (vehicle ID number)• date of production• motor type• continent• country⇒ new vehicle series⇒ new motor types for existing vehicle series

query 2: data on repair

• date of production• date of first permission• date of repair• date of credit note• VIN (vehicle ID number)• dealer number (workshop) ⇒

repair area• total cost• material cost• unit cost• incidental cost•...

query 2: data on repair

• date of production• date of first permission• date of repair• date of credit note• VIN (vehicle ID number)• dealer number (workshop) ⇒

repair area• total cost• material cost• unit cost• incidental cost•...

QUIS

query 1:workshop organization•workshop address•repair authorization for the different vehicle business divisions•affiliation to special workshop subgroups

branch offices•workshop (dealer) number

trade partnersrepresentatives

query 1:workshop organization•workshop address•repair authorization for the different vehicle business divisions•affiliation to special workshop subgroups

branch offices•workshop (dealer) number

trade partnersrepresentatives

VEGA

65

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Case Study , REVI-MINERData Preparation (continues)Data Preparation (continues)

Data CleaningData Cleaning

To check the quality of data the following approaches are developedTo check the quality of data the following approaches are developed

Descriptive statistic approach: Stored (historic) data has been described by descriptive statistics. The descriptions have been compared to values known from the documentation, or other sources than (accuracy Check)

Descriptive statistic approach: Stored (historic) data has been described by descriptive statistics. The descriptions have been compared to values known from the documentation, or other sources than (accuracy Check)

a statistical prototype based on normal distribution assumption (Outlier Detection)a statistical prototype based on normal distribution assumption (Outlier Detection)

Application of GritBot developed by Ross Quinlan (Outlier Detection)Application of GritBot developed by Ross Quinlan (Outlier Detection)**Quinlan, R., GritBot – An informal tutorial, http://www.rulequest.com/gritbot-unix.html, 2000

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Deviation Analysis Deviation Analysis

Criteria chosen for deviation analysisCriteria chosen for deviation analysisDiscussions with the end users showed that the needed criteria to identify and analyze deviations in warranty and goodwill data should cover the main cost types (damage types)

total cost (total number of repairs)

labor cost (number of working hours)

cost for repair material (number of repairs with deployment of repair material)

cost for exchange of vehicle aggregates, e.g. gear unit, air conditioner unit, motor unit (number of repairs with deployment of aggregates)

All criteria by cost and damage types must be calculated for each damage code on the chosen level of damage code aggregation (2-digit, 5-digit or 7-digit damage code) for each workshop.

Discussions with the end users showed that the needed criteria to identify and analyze deviations in warranty and goodwill data should cover the main cost types (damage types)

total cost (total number of repairs)

labor cost (number of working hours)

cost for repair material (number of repairs with deployment of repair material)

cost for exchange of vehicle aggregates, e.g. gear unit, air conditioner unit, motor unit (number of repairs with deployment of aggregates)

All criteria by cost and damage types must be calculated for each damage code on the chosen level of damage code aggregation (2-digit, 5-digit or 7-digit damage code) for each workshop.

Case Study , REVI-MINER

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Deviation Analysis , some resultsDeviation Analysis , some results

Case Study , REVI-MINER

Weighted absolute deviation of averages between workshop and workshop cluster for top damage codes

01000200030004000500060007000

3311

398

065

5407

183

497

5431

154

583

1812

482

485

3322

415

305

7814

154

102

5413

509

175

8343

7

damage code

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Deviation Analysis , some resultsDeviation Analysis , some results

Case Study , REVI-MINER

Sum of weighted absolute deviations of average costs between workshop and workshop cluster for

top damage codes

0200000400000600000800000

10000001200000

0830

121

400

2250

021

611

2120

000

050

2291

822

903

2190

321

414

workshop number

©2008 Gholamreza Nakhaeizadeh. All rights reserved

DeploymentDeployment

Case Study , REVI-MINER

The Data Mining tool REVI-MINER has been supporting the controlling efforts to detect and avoid fraudulent activities within the workshop organization

Its functionality covered the essential phases of a Data Mining process and provides a user interface with easily manageable menus based upon VISUAL BASIC forms

REVI-MINER provides the methods for a fast, efficient and meaningful analysis of the warranty and goodwill data for workshops thus giving the experts of the revision department crucial hints upon possibly fraudulent activities

The Data Mining tool REVI-MINER has been supporting the controlling efforts to detect and avoid fraudulent activities within the workshop organization

Its functionality covered the essential phases of a Data Mining process and provides a user interface with easily manageable menus based upon VISUAL BASIC forms

REVI-MINER provides the methods for a fast, efficient and meaningful analysis of the warranty and goodwill data for workshops thus giving the experts of the revision department crucial hints upon possibly fraudulent activities

70

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Fraud Detection, Credit Card Fraud*

• Credit card fraud is an international criminal activity that is increasingly run by organized crime syndicates that have industryinsiders on their payrolls.

• Global losses to the extent of $3.8 Mrd.

• Credit card fraud is an international criminal activity that is increasingly run by organized crime syndicates that have industryinsiders on their payrolls.

• Global losses to the extent of $3.8 Mrd.

ExtentExtent

• Counterfeit• Card-not-present• Lost-stolen card• Intercepted in post• Application Fraud•….

• Counterfeit• Card-not-present• Lost-stolen card• Intercepted in post• Application Fraud•….

CategoriesCategories

* Source: Fraud Magazine Volume:18 Issue:3 Dated:May/June 2004 Pages:26-29-48

71

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Fraud Detection, Credit Card FraudCase Study 5Case Study 5

Business task: Helping European banks to reduce credit card fraud

Data Mining task: Clustering and Classification

Data Mining algorithm: (Advanced) Artificial Neural Network ANN scrutinises card transactions to deliver a highly accurate risk score by analysing the spending behaviour of each cardholder along with the profile of each merchant.

Data used : unknown BUT related to customer spending behavior

Technology Partner: Fair Isaac

VISA EU: Fraud Detection Tool VISOR (Visa Intelligent Scoring of Risk )

The system analyses each card transaction and highlights any suspicious activity on an account, allowing the bank to take action.

VISA EU: Fraud Detection Tool VISOR (Visa Intelligent Scoring of Risk )

The system analyses each card transaction and highlights any suspicious activity on an account, allowing the bank to take action.

Source: http://www.out-law.com/page-4189Source: http://www.out-law.com/page-4189

72

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Fraud Detection, Credit Card Fraud

• The system works by analysing all transactions that pass through Visa's own payment processing system, known as VisaNet.

For each transaction VISOR will check:

• The cardholder profile, including previous spending patterns. • The merchant profile. • Up to 240 regional, country or bank specific fraud detection rules.

• The system works by analysing all transactions that pass through Visa's own payment processing system, known as VisaNet.

For each transaction VISOR will check:

• The cardholder profile, including previous spending patterns. • The merchant profile. • Up to 240 regional, country or bank specific fraud detection rules.

• Each analysis results in a score, and the higher the score the greater the probability of fraud.

• If the score is above a threshold set by the bank, an alert is sent so that the bank can view the details. These details will include the risk score, amount, currency and other account transactions over the past week.

• The bank then decides if the transaction is fraudulent, and feeds the results back into VISOR.

• Each analysis results in a score, and the higher the score the greater the probability of fraud.

• If the score is above a threshold set by the bank, an alert is sent so that the bank can view the details. These details will include the risk score, amount, currency and other account transactions over the past week.

• The bank then decides if the transaction is fraudulent, and feeds the results back into VISOR.Source: http://www.out-law.com/page-4189Source: http://www.out-law.com/page-4189

FunctionalityFunctionalityCase Study 5Case Study 5

73

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Fraud Detection, Credit Card Fraud

The system has already been piloted by

• Barclays Bank International in Germany, • ICS in the Netherlands • the Nationwide in the UK

A full roll-out of the system is planned from January this year, with the phasing out of Visa's existing fraud detection system, CRIS Online 2, by the end of March.

The system has already been piloted by

• Barclays Bank International in Germany, • ICS in the Netherlands • the Nationwide in the UK

A full roll-out of the system is planned from January this year, with the phasing out of Visa's existing fraud detection system, CRIS Online 2, by the end of March.

Deployment Deployment

John Chaplin, Executive Vice President of Visa EU said: "With growing fraud losses across Europe, fraud detection is an essential tool for any card issuer. Early pilots indicate that Members are seeing an increase of anywhere between 15 to 60 per cent in fraud detection rates, depending on the number of transactions scrutinised. These immediate short term results confirm that VISOR will be a powerful tool for our member banks to combat their exposure to fraud."

John Chaplin, Executive Vice President of Visa EU said: "With growing fraud losses across Europe, fraud detection is an essential tool for any card issuer. Early pilots indicate that Members are seeing an increase of anywhere between 15 to 60 per cent in fraud detection rates, depending on the number of transactions scrutinised. These immediate short term results confirm that VISOR will be a powerful tool for our member banks to combat their exposure to fraud."

Case Study 5Case Study 5

74

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Money LaunderingMoney Laundering

Application of Data Mining in

Anti Money Laundering

Application of Data Mining in

Anti Money Laundering

75

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Money Laundering

• Money laundering generally involves a series of multiple transactions used to disguise the source of financial assets

• Through money laundering, the criminal tries to transform the monetary proceeds derived from illicit activities into funds with an apparently legal source

• Money laundering generally involves a series of multiple transactions used to disguise the source of financial assets

• Through money laundering, the criminal tries to transform the monetary proceeds derived from illicit activities into funds with an apparently legal source

DefinitionDefinition Source: http://www.dmreview.com/specialreports/20071002/1093412-1.html

clean moneyclean money

Ban

king

Syst

em

dirty moneydirty money

76

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Money Laundering

Worldwide value of laundered funds in a year ranges between $500 to $1000 Mrd

Worldwide value of laundered funds in a year ranges between $500 to $1000 Mrd

ExtentExtent

• weak financial regulatory systems• lax enforcement• gaps in the information systems of

financial institutions • corruption

• weak financial regulatory systems• lax enforcement• gaps in the information systems of

financial institutions • corruption

Main Reasons for MLMain Reasons for ML

Source: http://www.dmreview.com/specialreports/20071002/1093412-1.html

77

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Money LaunderingMoney LaunderingProcess of Money LaunderingProcess of Money Laundering

Placement:Placement of currency into a financial services institution

Placement:Placement of currency into a financial services institution

Phase 1Phase 1

Layering:Movement of funds from institution to institution to hide the source and ownership of the funds

Layering:Movement of funds from institution to institution to hide the source and ownership of the funds

Phase 2Phase 2 Institution 1Institution 1

Institution 2Institution 2

Institution nInstitution n

…..

Integrationreinvestment of those funds in an ostensibly legitimate business

Integrationreinvestment of those funds in an ostensibly legitimate business

Phase 3Phase 3

Advances in inform

ation technologies for banking and financial services helpA

dvances in information technologies

for banking and financial services help

Source: http://www.dmreview.com/specialreports/20071002/1093412-1.html

78

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Example of Process of money launderingExample of Process of money laundering

http://money.howstuffworks.com/money-laundering2.htm

79

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Money LaunderingMoney Laundering

Application of Data Mining in Anti-Money Laundering (AML)Application of Data Mining in Anti-Money Laundering (AML)

transactions from/to uncooperative countries or exposed persons

unusual high cash deposits

high level of activity on accounts that are generally little used

withdrawal of assets shortly after they were credited to the account

many payments from different persons to one account

…………..

transactions from/to uncooperative countries or exposed persons

unusual high cash deposits

high level of activity on accounts that are generally little used

withdrawal of assets shortly after they were credited to the account

many payments from different persons to one account

…………..

Examples of what has to be detectedExamples of what has to be detected

Source: www.aifb.uni-karlsruhe.de/AIK/veranstaltungen/aik13/presentations/kietz-dataMining.ppt -

80

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Money LaunderingMoney Laundering

Application of Data Mining in Anti-Money Laundering (AML)Application of Data Mining in Anti-Money Laundering (AML)

Many of DM-Methods discussed in “Fraud Detection”can be used in AML tooMany of DM-Methods discussed in “Fraud Detection”can be used in AML too

Data warehousing can help enforcement to consolidate financial transactions from multiple institutions across several countries

This helps analysis of transactions

Data warehousing can help enforcement to consolidate financial transactions from multiple institutions across several countries

This helps analysis of transactions

81

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Financial Risk Management

Application of Data Mining in Market

Risk Management

Application of Data Mining in Market

Risk Management

82

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Business Issues

Market RiskMarket Risk

change ofchange of

InterestRate

InterestRate

Exchange Rate

Exchange Rate

Stock Indices

Stock Indices

…….. ……..

Data Mining ApplicationData Mining Application

Forecasting and Analyzing of Forecasting and Analyzing of

Market Risk AssessmentMarket Risk Assessment

83

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Market Risk, Examples

Steurer, Elmar; University of Karlsruhe „Econometrics methods and machine learning

procedures for exchange rate forecasting :Theoretical analysis and empirical comparison “.

Tae Horn Hann, Elmar Steurer: Much ado about nothing? Exchange rate forecasting: Neural networks vs. linear models using monthly and weekly data. Neurocomputing 10(4): 323-339 (1996)

84

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Market Risk, Examples

Rauscher, Folke; University of Karlsruhe „Hybrid forecasting methods for exchange rate analysis: Combination possibilities of multivariate cointegration, neural networks and multi-task Learning “

Rauscher, Folke; University of Karlsruhe „Hybrid forecasting methods for exchange rate analysis: Combination possibilities of multivariate cointegration, neural networks and multi-task Learning “

85

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Market Risk, Examples

Short term prediction of the dollar exchange rate by using neural networks

Jurgen Graf, Gholamreza Nakhaeizadeh : Application of Learning Algorithms to Predicting Stock Prices in: Plantamura, V.L. et al. : Frontier Decision Suppot Concept.. pp.241ff, John Wiley, 1994

86

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Application of Data Mining in Market Risk

Data Mining AlgorithmsData Mining Algorithms

Supervised Learning

Continues valuedtarget variable

Continues valuedtarget variable

nominal valuedtarget variable

nominal valuedtarget variable

ExampleValue of interest rate

3,53,54,13,84,2

ExampleValue of interest rate

3,53,54,13,84,2

ExampleChange Direction

ExampleChange Direction

• Decision Trees• Logistic Regression• Random forest• ANN• KNN•….

• Decision Trees• Logistic Regression• Random forest• ANN• KNN•….

• Regression• Regression Trees• ANN• KNN•….

• Regression• Regression Trees• ANN• KNN•….

87

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Y= GDPCO= Total Personal ConsumptionI= Total Gross Private InvestmentG= Government Purchases of Goods and ServicesR= Interest rateYD= Disposal Income

Y= GDPCO= Total Personal ConsumptionI= Total Gross Private InvestmentG= Government Purchases of Goods and ServicesR= Interest rateYD= Disposal Income

Market Risk, Demo: Forecasting of Interest Rate

RapidddMiner: InterestRate.xml

(comparison between regression and ANN)

88

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Market Risk, Demo: Forecasting of „Deutsche Aktien Index“ (DAX)

RapidddMiner: GermanStocks.xml(comparison between regression and ANN)

Description of VariablesInterest Rate

bmw BMW-Stock Price

mru Münchner Rückv.-Stock Price

rwe RWE-Stock Price

vow VW-Stock Price

kar Karstadt-Stock Price

sie Siemens-Stock Price

bas BASF-Stock Price

index Index of Dax

time Number of the days

89

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Financial Risk Management

Application of Data Mining inCustomer

Relationship Management

Application of Data Mining inCustomer

Relationship Management

90

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Business Issue: Customer Relationship Management (CRM) in Banking

Definition: CRM consists of the processes a company uses totrack and organize its contacts with its current and prospective customers

Source:http://en.wikipedia.org/wiki/Customer_relationship_management

• Customer retention and brand loyalty (it is more difficult to gain a new customer than to keep one)

• Customer retention and brand loyalty (it is more difficult to gain a new customer than to keep one)

• Identifying potential customers • Identifying potential customers

• Reduction of costs of operation• Reduction of costs of operation

• Providing 360-degree view of the customer• Providing 360-degree view of the customer

CRM GoalsCRM Goals

91

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Business Issue: CRM in Banking

CRMCRM

Collaborative CRMCollaborative CRM Geographic CRMGeographic CRM

Operational CRMOperational CRM Analytical CRMAnalytical CRM

Data MiningData Mining

92

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Com

pone

nts

for O

pera

tiona

l CRM

Components for Analytical CRM

Components for Collaborative CRM

Principal Components of

CRM Systems

Campaign Management

Sales Force Automation

ERP-systems

Customer Service

Call Center Mail/ Fax Web/ Email Personnel Contact

Channel Management

Data Storage and Selection

Data Analysis

Data Collection

Data Mining

93

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Business Issue: Analytical CRM

Data Mining ApplicationsData Mining Applications

Data Mining

94

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Analytical CRM

acquisition program loyalty program acquisition program

Source: http://www.prudsys.de/Service/Downloads/bin/DMC2003_arndt-daimlerchrysler.pdf

LOYALTY LOOP

time

level ofCLC stages

considerationawarenessformation purchase ownership

reconsiderationrepurchase

outsideownership

Customer Life Cycle

Typical aCRMtasks

Typical aCRMtasks

- customer segmentation- cross/up-selling analysis- customer value analysis-..

- customer segmentation- cross/up-selling analysis- customer value analysis-..

- data collection- predictive modeling-Response analysis-..

- data collection- predictive modeling-Response analysis-..

- detection of refection- churn analysis- modeling for recovery- response analysis-..

- detection of refection- churn analysis- modeling for recovery- response analysis-..

95

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Analytical CRM: Data Situation

Data Situation Along the Customer Life Cycle

Source: http://www.prudsys.de/Service/Downloads/bin/DMC2003_arndt-daimlerchrysler.pdf

External data

suspect prospect active customer former customer

Internal data

time

ratio of external and internal data

Acquisition Loyalty Recovery

96

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Analytical CRM in Banking: Case Study

Churn Analysis in Retail BankingChurn Analysis in Retail Banking Business UnderstandingBusiness Understanding

Business ProblemBusiness Problem

* This case study is described in “ Customer Churn Prediction- a case study in retail banking” by Mutanen, et all.The paper can be found in: http://wortschatz.uni-leipzig.de/~macker/dmbiz06/PracticalDataMining.pdf

*

• Customer Churn in banking is one of the important issue in highly competitive financial industry

• Customer Churn in banking is one of the important issue in highly competitive financial industry

• Customer Churn describes the number or percent of the customers who cut their relationship with the bank

• Customer Churn describes the number or percent of the customers who cut their relationship with the bank

97

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Analytical CRM in Banking: Case Study

Churn Analysis Retail BankingChurn Analysis Retail Banking Business UnderstandingBusiness Understanding

Goals :

• Identifying the customers who are at risk of leaving the bank with a certain probability

• Determining whether the effort is worth to retain such customers

Goals :

• Identifying the customers who are at risk of leaving the bank with a certain probability

• Determining whether the effort is worth to retain such customers

• How much is being lost because of customer churn ?

• What is the scale of the efforts that would be appropriate for retention campaign?

• How much is being lost because of customer churn ?

• What is the scale of the efforts that would be appropriate for retention campaign?

98

©2008 Gholamreza Nakhaeizadeh. All rights reserved

The 20% of customers generate 47.5 % of profit (based on a database containing information on 1.6 Million private customers)

The 20% of customers generate 47.5 % of profit (based on a database containing information on 1.6 Million private customers)

30.35%

47,47%59,99%

69,74%77,53%

84,35%90,11%

94,52% 97,97% 100,00%

0,00%10,00%20,00%30,00%40,00%50,00%60,00%70,00%80,00%90,00%

100,00%

1 2 3 4 5 6 7 8 9 10

Decile

Profit per Decile %

Analytical CRM in Banking: Customer Value

Customer value analysis makes differentiation in individual level possible. Therefore, it is pre condition for profitable growth

Customer value analysis makes differentiation in individual level possible. Therefore, it is pre condition for profitable growth

ImportanceImportance

Source: Dirk Arndt “value oriented customer management”, personal correspondence

99

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Analytical CRM in Banking: Case Study

Churn Analysis Retail BankingChurn Analysis Retail Banking Business UnderstandingBusiness Understanding

Customer churn rate has strong impact on the customer life value and affects

• the length of the services

• further revenues

Customer churn rate has strong impact on the customer life value and affects

• the length of the services

• further revenues

• Sufficient information representing the churn is the probability of its occurrence

• Given the limited resources, the high probability churners can be contacted first

• Sufficient information representing the churn is the probability of its occurrence

• Given the limited resources, the high probability churners can be contacted first

• In retail banking customers stay with the bank for very long time, but, potential loss of revenue due to churn can be pretty high

• In retail banking customers stay with the bank for very long time, but, potential loss of revenue due to churn can be pretty high

100

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Analytical CRM in Banking: Case Study

Churn Analysis Retail BankingChurn Analysis Retail Banking Data UnderstandingData Understanding

• Customer database from a Finish bank• Data was collected in the period December 2002 - September 2005• Number of observations 151000• 75 attributed were collected between them churn as target variable

• Customer database from a Finish bank• Data was collected in the period December 2002 - September 2005• Number of observations 151000• 75 attributed were collected between them churn as target variable

• Data Mining task: prediction of the churn’s probability • Data Mining task: prediction of the churn’s probability

Appropriate mining algorithms:

• Logistic Regression• ……

Appropriate mining algorithms:

• Logistic Regression• ……

was used was used

101

©2008 Gholamreza Nakhaeizadeh. All rights reserved

CRM : Churn Analysis, Demo

Rapid Miner: Churn_xml

RapidMiner

Comparison between DT and ANNComparison between DT and ANN

102

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Analytical CRM in Banking: Case Study

Customer Satisfaction MetricCustomer Satisfaction Metric

Characteristics of the Pilot

Goal: Compare the service quality of automotive financial service providers in terms of loan and lease with respect to

Goal: Compare the service quality of automotive financial service providers in terms of loan and lease with respect to

Content of mailing: Cover letter (one page) and questionnaire (two pages)

Type of questionnaire: Choice between a paper-and-pencil questionnaire andan identical online version

Time span: June - September 2002

Number of mailings: 34,198

Content of mailing: Cover letter (one page) and questionnaire (two pages)

Type of questionnaire: Choice between a paper-and-pencil questionnaire andan identical online version

Time span: June - September 2002

Number of mailings: 34,198

Source: http://www.prudsys.de/Service/Downloads/bin/DMC2003_arndt-daimlerchrysler.pdf

103

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Analytical CRM: Case Study

Source: http://www.prudsys.de/Service/Downloads/bin/DMC2003_arndt-daimlerchrysler.pdf

Human Factor: Example: knowledge of the person who set up the loan or lease

Human Factor: Example: knowledge of the person who set up the loan or lease

Set up Factor: Example: Ease of getting information on lease or loan issues

Set up Factor: Example: Ease of getting information on lease or loan issues

Fair Deal factor: Example: Accuracy of billingFair Deal factor: Example: Accuracy of billing

Convenience Factor: Example: clarity of financing contractConvenience Factor: Example: clarity of financing contract

Attributes used:

Method used: Regression AnalysisMethod used: Regression Analysis

104

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Optimal structure of a Data Mining Team

Visualization

DatabaseTechnology

Machine Learning

Data Mining

Customer RelationshipManagement

Examplesof application

areas

StatisticsVisualization

Privacy

QualityManagement

Credit RiskManagement

105

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Sunday, August 20, 1:30 -- 2:30 pm, Palais Des Congres, Montreal, Canada Position statements of:

- Tej Anand, AT&T GIS- Dr. Gholamreza Nakhaeizadeh, Daimler-Benz- Evangelos Simoudis, IBM, co-chair- Gregory Piatetsky-Shapiro, GTE Laboratories, co-chair- Ralphe wiggins, statement Harvesting- Kamran Parsaye, statement Discovery- Mario Schkolnick, SGI

Source: http://www-aig.jpl.nasa.gov/public/kdd95/KDD95-Panels.html

Sunday, August 20, 1:30 -- 2:30 pm, Palais Des Congres, Montreal, Canada Position statements of:

- Tej Anand, AT&T GIS- Dr. Gholamreza Nakhaeizadeh, Daimler-Benz- Evangelos Simoudis, IBM, co-chair- Gregory Piatetsky-Shapiro, GTE Laboratories, co-chair- Ralphe wiggins, statement Harvesting- Kamran Parsaye, statement Discovery- Mario Schkolnick, SGI

Source: http://www-aig.jpl.nasa.gov/public/kdd95/KDD95-Panels.html

KDD-95 panel on Commercial KDD Applications:The "Secret" Ingredients for Success

KDD-95 panel on Commercial KDD Applications:The "Secret" Ingredients for Success

Success Factors of DM-Applications

106

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Success Parameters of Data Mining Solutions

Clear defined goals

Importance of the business problem

Management attention and support

Data availability and quality

Competence of the Data Mining team

Close cooperation between the Data Mining team and the end-users

Integration of the Data Mining Solution in the daily business process of the users

Other parameters (Please describe briefly)

107

©2008 Gholamreza Nakhaeizadeh. All rights reserved

Lessons learned

• Clear defined goals

• Importance of the business problem

• Management attention and support

• Willingness for long term investment

• Understanding of the power and

importance of data

Think big! Slice smart! Start small!

Think big! Slice smart! Start small!