Customer Segmentation using Clustering

24
1 Confidential. © Stream Intelligence Ltd. All rights reserved. Introduction to Clustering

Transcript of Customer Segmentation using Clustering

Page 1: Customer Segmentation using Clustering

1

Confidential. © Stream Intelligence Ltd. All rights reserved.

Introduction to Clustering

Page 2: Customer Segmentation using Clustering

2

Confidential. © Stream Intelligence Ltd. All rights reserved.

Agenda

1 Introduction: Business Case

2 Clustering

3 Hierarchical Clustering

4 K-means Clustering

Page 3: Customer Segmentation using Clustering

3

Confidential. © Stream Intelligence Ltd. All rights reserved.

Business Case

1

Page 4: Customer Segmentation using Clustering

4

Confidential. © Stream Intelligence Ltd. All rights reserved.

Business Case – Predicting Successful Music Production

Clustermusic A

Cluster music B

Clustermusic C

Clustermusic D

• Target is to appear at Billboard’s weekly to 40• Cost per single could up to 300K USD• Music Intelligence Solution using clustering to predict if a music will be

accepted by market• Increase success rate from 1 out of 10 to 8 out of 10

Page 5: Customer Segmentation using Clustering

5

Confidential. © Stream Intelligence Ltd. All rights reserved.

Clustering

2

Page 6: Customer Segmentation using Clustering

6

Confidential. © Stream Intelligence Ltd. All rights reserved.

Statistical Learning Categorization

Statistical Learning

Unsupervised Learning

Supervised Learning

Clustering Predictive Model

Page 7: Customer Segmentation using Clustering

7

Confidential. © Stream Intelligence Ltd. All rights reserved.

Clustering

• Process of grouping a set of physical or abstract objects into clusters (example: customer, product etc.)

• A cluster is a collection of data objects that are similar to one another within the same cluster and are dissimilar to the objects in other clusters

• Similarity is calculated based distance between point

• Common distance measure is Euclidian distance

Page 8: Customer Segmentation using Clustering

8

Confidential. © Stream Intelligence Ltd. All rights reserved.

Hierarchycal Clustering

2

Page 9: Customer Segmentation using Clustering

9

Confidential. © Stream Intelligence Ltd. All rights reserved.

Hierarchical Clustering

• Start with each data point in its own cluster

Page 10: Customer Segmentation using Clustering

10

Confidential. © Stream Intelligence Ltd. All rights reserved.

Hierarchical Clustering

• Combine two nearest clusters (Euclidian, Centroid)

Page 11: Customer Segmentation using Clustering

11

Confidential. © Stream Intelligence Ltd. All rights reserved.

Lets Practice

• The data for this exercise was downloaded from www.movielens.org• Open “clustering_movie.R”• The movies in the dataset are categorized as belonging to different gender:

a. Actionb. Comedyc. Sci-Fid. etc.

Page 12: Customer Segmentation using Clustering

12

Confidential. © Stream Intelligence Ltd. All rights reserved.

Dendogram

Heights represent the distance between point/cluster

Page 13: Customer Segmentation using Clustering

13

Confidential. © Stream Intelligence Ltd. All rights reserved.

Finding Meaningful Cluster

• How to see which cluster have the most action movies? use this command:

tapply(movies$Action, clusterGroups, mean)

• Exercise: Can you find the characteristic of each cluster? Hint:

- Add the cluster as one of the variable in the data- Load dplyr library- Use aggregate and summarise function

Page 14: Customer Segmentation using Clustering

14

Confidential. © Stream Intelligence Ltd. All rights reserved.

Common scenario

Tips:- Normalize the data

Movie Action Romance Rating Revenue (in USD)

A 1 1 5 200B 0 1 4 150C 0 0 3 50D 1 1 4 120

Page 15: Customer Segmentation using Clustering

15

Confidential. © Stream Intelligence Ltd. All rights reserved.

K-means Clustering

2

Page 16: Customer Segmentation using Clustering

16

Confidential. © Stream Intelligence Ltd. All rights reserved.

K-Means Clustering

1. Group data into K-clusters by:a. Determining the k centroidb. Group the data points to the nearest centroid

2. Algorithm works by iterating between two stages until the data points converge

Objective : High Level Description

Page 17: Customer Segmentation using Clustering

17

Suppose k=3

K-Means Illustrations

Page 18: Customer Segmentation using Clustering

18

Iteration = 0

1. Start with random positions of centroids.

K-Means Illustrations

Page 19: Customer Segmentation using Clustering

19

Iteration = 1

1. Start with random positions of centroids.2. Assign each data point to closest centroid

K-Means Illustrations

Page 20: Customer Segmentation using Clustering

20

Iteration = 1

1. Start with random positions of centroids.2. Assign each data point to closest centroid3. Move centroids to center of assigned

points (recalculating C)

K-Means Illustrations

Page 21: Customer Segmentation using Clustering

21

Iteration = 3

1. Start with random positions of centroids.2. Assign each data point to closest centroid3. Move centroids to center of assigned

points4. Iterate till minimal cost

K-Means Illustrations

Page 22: Customer Segmentation using Clustering

22

Iteration = 3

1. Start with random positions of centroids.2. Assign each data point to closest centroid3. Move centroids to center of assigned

points4. Iterate till minimal cost

What potentially can go wrong?

Page 23: Customer Segmentation using Clustering

23Optimum Number of Cluster Illustrations

TSS = Total Sum of Square ErrorK = Number of cluster

Optimum Number of Cluster

Page 24: Customer Segmentation using Clustering

24

Confidential. © Stream Intelligence Ltd. All rights reserved.

Lets Practice

• We will use the credit card profile data (cc-profile.csv)• Open “segmenting_customer.R”

Exercise:• What is the optimum number of cluster?• Please provide the characteristics of segment. Do you think it is meaningful?