Selection • Process of selecting a suite of products or services based on some parameters for a
prospective consumer • Helps consumers find products those fit their needs, including unplanned
purchases • Increase cross-selling • and many more ……. Modes of selection • Recommendation • Suggestion • Mailing lists • Ads What is some parameters ? • Human behavior • History • Browsing pattern • Email and other promotions • and many more ……
Fulfilment
• The condition where a “selection” leads to a successful completion (purchase) for the ecommerce company.
• Along with the parameters those affect selection, fulfilment is also affected by the selection mode.
– 70% of a page is dedicated to recommendations – Amazon
– Location for the ads and the recommendations
– etc ….
Is fulfilment related to selection ??
• Selection strength :- % of page for recommendations, selection mode, size and location of the recommendations etc …
• When the selection strength is small, the chances of fulfillment is low.
• When the selection strength is large, the consumer again gets into an indecision and again the chances fulfilment is affected.
• There needs to be a trade-off between selection and fulfilment. There is an optimum value.
How to determine the optimum value ? Use machine learning • Collaborative filtering • User profile based • Content based • Hybrid methods
Extract data
Purchase history
Click pattern
Consumer profile
Machine learning for
selection
Machine learning for fulfilment
Selection
Product based
Surveys, reviews
Standard flow with relevant frameworks
and methods
Data extraction, pre-processing
Machine Learning based
selection
Ecommerce implementation
Visualization Hadoop,
MongoDB, REST API, streams (twitter etc..)
R, python, clustering,
regression, PCA etc ..
Web, mobile and other clients
D3.js, web etc..
Generalized Machine Learning Based Methods
Steps:- 1. Feature identification
– Data mining – Data filtering
2. Target identification – For unsupervised models, convert to supervised using clustering, naïve Bayes and other
methods – Identify targets
3. Classifier or regression (ML model) – K-Nearest Neighbor (Slow but effective) – Clustering (Less accurate) – Decision Trees – Horting – Text categorization (for social and streaming data) – Bayesian network based methods (economic and fast) – Genetic Algorithms – Hybrid methods (a combination of any of the ones above)
4. Implementation of the ML model on the features and target 5. Model validation
– Root mean square – Cross-validation (break data into training data and testing data) – F1 Scores
Standard Approach • Content base (you tell us what you like)
– Browsing history
– Purchase history
– Surveys etc ..
• Collaborative filtering (customers like you tell us what you may like)
– Item-item filtering
– User-item filtering
• Collective intelligence (common consensus)
• Discover
– Dialog design
– Catalog based
– questions
Collect Preferences
Find Similarities Unsupervised Supervised
Selection and recommendation
• Manhattan distance
• Eucledian distance
• Pearson correlation
• Cosine similarities
• Clustering
• Factor analysis
• Principal component analysis
• Web interface, mobile apps
• D3.js
• Java script
This step also used machine learning
Filtering involves unsupervised learning methods
Collect Preferences Steps:- 1. Identify features (What features to use ?)
1. Some features might be suitable for “Selection” and some for “Fulfillment” . There could be lots of common features too.
2. Data mining 3. Date extraction
2. Encoding 1. Convert qualitative data to coded numeric values 2. Perform necessary scaling
Assume that there are N features, there would be a column vector that would have all the features numeric values in it for a specific user. In the left we see the vector. For that user, feature 1 takes some value 0.7 (assume that it is purchase history ), value 0.2 for feature 3 (this can be assumed as clickstream) …. The values 0 means that the user is of part of that feature.
• Assume that have M customers, now we get a N by M matrix as,
Collect Preferences contd ..
• A row:- represents the preferences of all the users for a specific feature/item
• A column:- represents the preferences of a specific user on all the features.
• This matrix forms our training data. This goes into the machine learning methods as an input.
• Remove frequent buyers, because they bias the data. And there are many more such outliers…
Collect Preferences contd .. The Objectives
I get a new user and build a user preference vector for the user. Assume that it is,
I get a new user and build a user preference vector for the user. Assume that it is <vector on the left> Can I find out the customer/s from the training data whose feature preferences match to this new customer’s feature preference ? Yes, Use machine learning !!!
Find similarities • This problem is an unsupervised learning
problem • Find users who a similar set of feature
preference vectors. • A very popular method is kNN (k Nearest
Neighbor ) method. • Other methods:-
– Gaussian Naïve Bayes – Decision trees – K-means clustering
• Similarity measurements are done using, – Manhattan distance – Eucledian distance – Pearson correlation – Cosine similarities – And many more …..
• At this point the differences between selection and fulfilment are quantified.
• They would have a different clustering structure.
Examples • Rows 1 and 3 are similar to
each other as compared to rows 1 and 2.
• Rows 1 an N are similar as compared to rows 2 and 3
What next ?
• Once the similarities are established and the clusters are formed the model can be converted to a “supervised” model.
• Steps:- – Targets are defined based on if it is selection or fulfillment.
(Let us say we have C clusters where each cluster is kind of a preference of a TYPE of customers)
– Each cluster is assigned to a target value – The matrix defined earlier would have a new column called
as “Preference” as a target. So each user would have a specific value as the Preference.
– For simplicity, let us assume that “Preference” takes discrete values as C1, C2, C3 ….. C10
Supervised Model
• Now we have a supervised model with each customer mapped to a target value.
• Supervised Learning Methods:- – Ridge regression – Linear regression – Decision trees – Random forests – Support vector machines
• When there is a new customer, – Based on data a preference vector
is built – Use the supervised model get a
prediction for the Target
• This is the transpose of the earlier matrix with customers as rows and features as columns
• There is a new column called as the Target added to the new model
• So here each customer is mapped to a target values.
Selection vs Fulfillment
• Selection or fulfilment is determined based on the preference matrix and the target. • If you observe that the matrices above there is a strong possibility that there are some common
features in both the matrices. • The targets, SELECTION TARGET and FULFILMENT TARGET are completely different and take
different values. • The differences are quantified at the clustering or the classification level. At this stage, the values
for the targets are defined. Different classification for selection as compared to fulfilment is the key.
Tools
• Python – pandas – data frames, data mining, filtering – scikit-learn – machine learning – Tweepy – twitter streaming – python-recsys – recommendation system – scrapy – web crawling – matplotlib - plotting
• R – rpart – machine learning – ggplot2 - plotting
• d3.js – Java script based plotting
Code Snippets Import GaussianNB, DecisionTrees, SupportVector Classification methods in python.
Initialize a classifier model (decision trees here) and train it with training features (X_train) and training target (y_train)
Develop training and test splitting of existing data for cross-validation testing of the accuracy of a model
Import Kmeans clustering, Gaussian Mixture Models, and Principal component analysis for unsupervised learning
Performs clustering on the given data . In this example two clusters are formed.
Top Related