Predictive analytics for business - Neoteric · The Power to Predict Who Will Click, Buy, Lie, or...

85
Predictive analytics for business How to use recommender systems, dynamic pricing and churn predicon to drive business results

Transcript of Predictive analytics for business - Neoteric · The Power to Predict Who Will Click, Buy, Lie, or...

  • Predictive analyticsfor businessHow to use recommender systems, dynamic pricing and churn prediction to drive business results

  • 2

    Introduction

    The role of AI in business Why should I invest in AI? How does AI help business?

    Predictive analytics 101 What is predictive analytics? How does it work? Why use predictive analytics?

    Recommender systems Types of recommender systems Does my business need a recommender system - the benefits of product recommendations Recommender systems in practice: Netflix Steps to follow while implementing a recommender system

    Dynamic pricing What is dynamic pricing? Dynamic pricing - dos and don’ts Dynamic pricing in practice: Amazon

    Churn prediction How does predictive analytics improve customer retention? Churn prediction in practice: major telecom company case study Getting started with predictive analytics

    Why is it all about data? Types of data Why does data quality matter? Do I need a data strategy? Data science lifecycle

    AI implementation - why is it so difficult? How do I get ready for AI adoption? What issues should I look out for? Kick off your AI project in a month - AI Sprint How ready are you for AI adoption?

    Glossary

    About the authors

    Table of contents

    Copyright © Neoteric, 2019

  • Predicting better than pure guesswork, even if not accurately, delivers real value. A hazy view of what’s to come outperforms complete darknessby a landslide

    Eric Siegel, Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die

  • 4

    Predictive analytics has been around for quite some time, but it’s gained much popularity in recent years with the whole buzz surrounding artificial intelligence. Predictive analytics gives companies the power to use data from the past to predict future outcomes. No more guessing - now companies can make data-driven decisions across all departments.

    Predicting better than pure guesswork, even if not accurately, delivers real value. A hazy view of what’s to come outperforms complete darkness by a landslide.

    Eric Siegel; Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die

    This ebook’s goal is to help executives and managers gain a better understanding of some AI technologies and learn how to approach a data science project strategically. In the following chapters, we cover topics related to recommender systems, dynamic pricing, churn analysis, data quality and data strategy, and overall AI adoption. We discuss the elements of a well-planned data strategy and the steps to prepare for the adoption of any AI solution.

    What will you learn?

    • Why every data science project starts with business, not tech

    Introduction

  • 5

    • Why AI has more to do with people that one can initially imagine• How you can use data to make your company flourish• Why data quality matters - and what it even means• How to create your data strategy• What problems may occur along the way

    ...and much more.

    Predictive analytics is a great tool that helps us see into the future, but in order for it to work properly, we need to make sure to feed the models with quality data - and so be able to trust the outcomes. In a recent MIT Sloan Management Review study1, it has been found that only a very small minority of respondents say they “always” trust data judged by qualities of relevance, completeness, timeliness, and accuracy - you can see the proportions of answers on the chart below.

    FIGURE: HOW OFTEN DO YOU TRUST THAT ANALYTICS DATA IS:

    Source: MIT SMR Data, Analytics, & AI: How Trust Delivers Value Trust

    1 MIT SMR: Data, Analytics, & AI: How Trust Delivers Value

  • 6

    Trust in data is crucial when you want your staff to rely on analytics to support decision-making. That’s why it’s so important to introduce and cultivate a company-wide data-driven culture, where people are encouraged to ask questions and the demand for knowing the data in any process is present. Deploying a predictive model is only a part of the process, but another important aspect is to teach people how to interpret and use the insights. Without putting insights in the hands of employees who need them, all you end up with is a glorified report. It’s your staff who know what insights they need, and they have to be taught how to use the model’s outcomes to enhance their work.

    With the right preparation, training, and overall strategy, you will be able to identify the right business case and implement the appropriate model, as well as prepare your employees for a new, data-driven, way of work.

  • Research shows that 47% of organizations have implemented it in at least one function in their business processes

    McKinsey: AI adoption advances, but foundational barriers remain

  • 8

    The role of AI in business

    2 McKinsey: All adoption advances, but foundational barriers remain

    The business world is beginning to adopt AI. Research2 shows that 47% of organizations have implemented it in at least one function in their business processes, compared to 20% in the previous year. Another 30% of respondents say they are piloting AI. The main sectors to have adopted AI include advertising, marketing, and media & entertainment companies. There’s a variety of use cases of AI for these sectors but the results show that companies generally follow the money when deploying AI, choosing the most relevant areas of their business.

    AI will allow companies to obtain or sustain a competitive advantage84%

    AI will allow companies to move into new businesses75%

    New organizations using AI will enter the existing markets75%

    Incumbent competitors will use AI69%

    Pressure to reduce costs will require businesses to use AI63%

    Suppliers will offer AI-driven products and services61%

    Customers will ask for AI-driven offerings59%

    Source: Boston Consulting Group and MIT Sloan Management Review

    Reasons for adopting AI WHY IS YOUR ORGANISATION INTERESTED IN AI ?

  • 9

    Artificial intelligence is making its way into more industries, with a wide variety of business use cases. It’s successfully used in healthcare, manufacturing, e-commerce, education or human resources , to name just a few. AI will learn anything that you want it to, so there’s a lot of flexibility when it comes to how you use it. What’s important to remember, however, is that you don’t have to follow the tech giants. You don’t have to use AI the same way that Amazon, Facebook or Netflix do. If you want to implement AI in your business, you should follow the money, and not somebody’s lead. See how your competitors use AI, think about the processes that you need to improve, match your business needs and goals with technology.

    Ask yourself a question:

    What would change if you could make more accurate predictions faster? How would that help your organization?

    Why should I invest in AI?

    The answer is simple: it’s profitable. A paper published by McKinsey3 suggests that three ML techniques, feedforward neural network, convolutional neural network, and recurrent neural network — together could enable the creation of between $3.5 trillion and $5.8 trillion in value each year in nine business functions in 19 countries. This is the equivalent of 1 to 9 percent of the 2016 sector revenue. What’s more, PwC research4 shows global GDP could be up to 14% higher in 2030 as a result of AI – the equivalent of an additional $15.7 trillion – making it the biggest commercial opportunity in today’s fast--changing economy.

    AI is said to follow a typical S-curve pattern, starting off slowly but with rapid acceleration as the technology matures and companies learn how to leverage it.

    3 McKinsey: Notes from the AI frontier4 PwC: Sizing the Prize

  • 10

    How does AI help business?

    Give AI a problem, feed it the right data, and it will solve it for you. How, you may ask? Artificial intelligence can help businesses data--mine processing billions of data points faster than ever. Whatever question you need AI to answer, it can do it and it will keep getting better at answering. What movie will Jane like? Will customer X churn? Will this item sell out? AI can provide accurate predictions about future outcomes based on historical data to generate insights

    As of now, AI is still a lot about innovation, not business use cases. Sure, artificial intelligence is innovative, but again: you don’t want to have it just as a fancy addition to your business. You want it to work for you and bring profits. When creating an AI implementation strategy, you should keep in mind your company’s overall business strategy and utilize technology, such as AI, to follow the main business vision.

    Why AI is critical ? PERCENTAGE OF RESPONDERS

    Base: Answered AI Section and at least 1 technology is critical for the organization to develop, n = 705;

    Question: What are the top 3 reasons why AI technologies are critical to your organization?

    Source: Gartner

  • 11

    that you wouldn’t otherwise have - it’s beyond human capabilities to process such amounts of data and identify all the patterns. Artificial intelligence converts information into knowledge, both about the current state and future outcomes. Of course, AI is not a fortune--telling technology, but it can support your business with predictive analytics.

  • For the first time in history, the predictive future—the increasing awareness andlikelihood of potential future actions and outcomes—is within reach

    Jeanne Harris & Mark McDonald, HBR Predictive Analytics in Practice

  • 13

    What is predictive analytics?

    Predictive analytics combines advanced analytics, predictive modeling, data mining, real-time scoring, and machine learning to help companies identify patterns in data. Predictive analytics refers to using historical data to predict what will happen in the future. The historical data is fed into a model that analyzes it to identify patterns. The model learns on the data from the past and is then applied to current data to predict future outcomes . Predictive analytics is already widely used in many ways: customer lifetime value (CLV) measures predict how much a customer will buy from a company, a product recommendation system predicts what the shoppers will like. There are also sales forecasts, credit scores, fraud detection, optimizing marketing campaigns, and predictive maintenance. Some of them seem mundane and don’t make us think of sophisticated AI, some are more “impressive” - and all are examples of predictive analytics in practice.

    For the first time in history, the predictive future—the increasing awareness and likelihood of potential future actions and outcomes—is within reach. No wonder, then, that executives have placed predictive analytics at the top of the executive agenda since 2012, according to a recent Accenture survey.

    Jeanne Harris & Mark McDonald, HBR Predictive Analytics in Practice

    Predictive analytics 101

  • 14

    How does it work?

    With big data, the way we derive relevant insights from information has changed. We had to switch from previous BI (business intelligence) approaches that used classic sorting of structured data to techniques that utilize raw information. Traditional business intelligence tools use a deductive approach - so there is an assumption of some understanding of existing patterns and relationships. Deductive methods work best with structured data, but what if that’s not what we have?

    There is another approach: inductive. The inductive approach doesn’t make any presumptions about patterns and relationships and is more about data discovery. Predictive analytics applies inductive reasoning to big data using machine learning, neural networks, and artificial intelligence to discover patterns and interrelationships.

    Why use predictive analytics?

    Predictive analytics can be used for decision-making and solving business problems, as well as identifying new market opportunities, enhancing customer experience, optimizing processes, reducing operational costs, and mitigating risk by predicting problems that may occur.

    So why use it?

    Imagine a situation in your sales department: think about how the experts make decisions. We call it “based on professional experience” and it’s true, they make decisions with the use of their expert knowledge. But there’s a share of their work which is intuitive - a few simple rules, experience, gut feeling. They know which customer buyswhat products and what content the audience will be interested in. And while their knowledge comes from their experience, there’s still a large share of guessing or making intuitive choices. But what if you enhance such a team with AI? You use actual data to help them make much more accurate predictions. In some cases, they will be happy to

  • 15

    Is your organization trying to find new ways to generate revenue?

    Most organizations apply predictive analytics to core functions that produce revenue

    Source: SAP

    see that what they’ve intuitively known to be true has been confirmed. In other cases, they will be fascinated to see how much potential data unlocks and how much more they can see.And there’s another benefit here - an expert can leave your company. If you rely on the expertise of one person, or a number of well--trained people, you need them to stay, and sometimes it’s just not possible. But the model stays.And the knowledge it unlocked is there to stay, too.

    The use cases of predictive analytics are vast and tailor-made to each problem: when you know what question you want the model to answer, a data science team will help you identify the data needed for the training of the model and choose what model has to be built. With the use of predictive analytics, companies are able to analyze and manage pricing trends, making it possible to offer optimal prices at the right time. Businesses are also able to predict the behavior of customers, making it possible to target the right audience and identify users likely to abandon their online experience. What’s more, properly analyzed data gives companies better insight into the groups of their customers and identifying patterns helps create better, more personalized offers.

    In this ebook, we’ll cover three forms of predictive analytics: recommender systems, dynamic pricing, and churn prediction. In the next chapters, you’ll find information on what these models do, what benefits they bring, and how to prepare to introduce them into your business.

  • A recommendation engine can accurately predict that a given user will be interested in a particular product thanks to the use of machine learning algorithms

  • 17

    A recommender system, or recommendation engine, is a data filtering tool that analyzes available data to make predictions about what a user will be interested in. AI-powered recommendation systems are widely used in commercial applications, especially in e-commerce, social media, and content-based services.

    Based on the information the system has from users’ activity like what content they displayed, what products they bought together, etc., a recommendation engine can accurately predict that a given user will be interested in a particular product thanks to the use of machine learning algorithms. To illustrate: services such as Netflix, Spotify, YouTube, Facebook, and Amazon use recommendation systems to predict which users will be interested in particular products or content, so whenever you see anything under “Recommended for you”, “Your playlist” or “Other shoppers also bought” - that’s the result of a recommender figuring out what stuff you like in order to increase the number of cross-sells and up-sells.

    Types of recommender systems

    There are three basic approaches to recommendation engines: collaborative filtering, content-based filtering, and hybrid recommendation systems.

    Recommender systems

  • 18

    Collaborative filtering

    Collaborative filtering is based on the assumption that if users agreed in the past, they will also agree in the future - meaning that if they liked the same things previously, the situation in the future won’t change. This method requires collecting and analyzing information about customers’ behaviors, their activities, and their preferences to identify patterns and provide accurate predictions based on the similarity to other users. Let’s take a simple example to illustrate this approach: if John likes items A, B, C, and D, while Mike likes items A, B, and C, chances are he will also like item D.

    However, collaborative filtering has some drawbacks.These issues include:

    Early rater

    When a new item appears, the system can’t recommend it because it doesn’t have any ratings for the item. It will take some time to get a sufficient number of ratings for the system to figure out what groups of users should be recommended the item.

    Sparsity

    With huge product bases, it’s difficult to make sure that enough people explore all the options available. If some item hasn’t been rated by a lot of people, the system won’thave data to base the predictions on.

    Gray sheep

    In order to recommend items, the system has to group people with overlapping interests. Many users will fall into these groups and enjoy the recommendations, but if some users don’t consistently agree or disagree with some group, they will not be given high-quality recommendations.

  • 19

    Content-based filtering

    Content-based filtering focuses on the attributes or descriptive characteristics of items to generate recommendations. In this approach, keywords are used to describe the item, and a user profile is built to show what kind of items the user likes. The assumption hereis that if you expressed interest in some item, you will also like items with similar characteristics, be it the topic of an article, a brand of products, color, shape, size, etc. This approach is often used with recommendations of articles and other text documents.

    Content-based filtering has some drawbacks as well:

    Content description

    In some cases, providing accurate descriptions of an item can be very difficult. If it’s music or videos that we’re trying to recommend, the representation of content is not always possible.

    Over-specialization

    If previous user behavior doesn’t show evidence for a user liking something, the system will not suggest it. If we want the system to provide recommendations outside the scope of what the user has already shown interest in, additional techniques need to be added.

    Subjective domain

    Content-based filtering techniques don’t deal well with subjective information such as point of view or humor.

    Hybrid recommendation systems

    As both of the approaches described above have some drawbacks, a solution was offered: to combine both approaches to deliver better

  • 20

    results. And it worked - hybrid systems prove to be more effective.A hybrid recommendation system makes use of both the representation of content and the similarities between users. There are a few ways to implement hybrid systems: making collaborative and content-based predictions separately and combining them, by adding content-based capabilities to a collaborative approach or the other way round, or by unifying the approaches into one model. Netflix is an example of the hybrid approach, combining the habits of similar users (collaborative filtering) and similar characteristics of content previously liked by a user (content-based filtering) to provide awesome recommendations.

    Does my business need a recommender system? The benefits of product recommendations

    Recommendation engines can significantly increase revenue, improve CTRs and conversions. They also contribute to the improvement of factors more difficult to measure, such as customer satisfaction, and they lead to increased customer retention.

    Sounds cool, doesn’t it? After all, we know that users value personalized experience. In fact, 59% of shoppers5 who have experienced personalization say it has a big influence on their purchase decisions. With a recommender system, you get comprehensive insights into both your customer base and your product base. You can easily see how users interact with the service, and generate reports. Recommenders find patterns in a blink of an eye to increase the probability of the user finding an item of interest, thus cutting down on the time needed to find it. The personalized experience boosts customer satisfaction, which translates into increased customer loyalty, increased consumption, and more profit. Additionally, with personalized content, such as newsletters, ads or notifications, you encourage users to come back to your site, so the number of visits goes up while churn drops.

    What benefits does a recommendation system bring to a company?

    5 According to Infosys report

  • 21

    Personalized experience

    In the past, personalized shopping experiences were a luxury available to the most affluent shoppers. A dedicated shopping assistant would recommend products and offer advice. Now, AI does that for us, and within milliseconds.

    Customers like this personalized touch - after all, most of us want to be recommended stuff: we ask family and friends about a variety of products and services. While their opinion matters, what works for them doesn’t have to work just as well for us. If you’ve never been disappointed by a product or service recommended by someone close to you, you are truly special. But a recommendation system does not have a taste of its own - it absorbs the user's individual preferences to deliver the most relevant results. The process is similar: if your aunt likes movie A and movie B, and you also like movie A, you may as well like movie B. At this stage, before we go further, please note that this is largely simplified and does not reflect the actual operation of a recommender system - it would not be able to work on data about 2 people and 2 movies. But with data from hundreds of thousands of users and a wide variety of items, it rocks. And personalized experience is there to make the customer happy, which leads us to the next benefit...

    Customer satisfaction

    Never underestimate the power of making your customers happy. And the risk of making them mad! 90% of customers say that their purchase decisions are influenced by online reviews, while 86% say that their decisions are influenced by negative reviews. But! Customers who had a bad experience with a company are twice to three times more likely to post a review than those who were happy. And, as reported by Entrepreneur6, even one negative review can cost a business 30 customers.

    With personalized shopping experience, however, you can actually

    6 Enterpreneur: Got a bad Yelp Review? Here's How to Defend Your Businesss Online

  • 22

    make them happy. The system stores data about their recent activities. So let’s say I go online and I look for a green T-shirt. Then I click through a few accessories. I leave the site just to go back to it because one of these accessories was a cool bag that I now want. But how the hell do I find it? Many sites “remember” what the user did in their last session and show the recently displayed items. The collection of on-site interactions has one additional benefit: this data can be transferred to the offline, too. Burberry uses this approach to enhance the customer experience in their brick-and-mortar stores through analyzing customer data (e.g. purchase history) and providing relevant recommendations that in-store assistants can then use to offer well-informed suggestions.

    The “discovery” factor

    Do you know Spotify’s “Discover Weekly” or Instagram’s discover feature? Both Spotify and Instagram analyze what content a user interacts with - played songs, watched videos, liked posts, favorite artists, frequently viewed profiles. This data gives away a lot of information about the user and just from that, the system can already figure out patterns and suggest relevant content that the user will likely want to follow or interact with. That’s great for two reasons:

    1. users are overwhelmed with the amount of content available online and finding the things they are really interested in may not be that easy - especially when they’re looking for new “discoveries”, e.g. new bands or movies,

    2. when users discover new, relevant content they are more likely to stay on the site longer and keep interacting with it - this may influence their decision on keeping a subscription (e.g. Spotify) or allow the platform to display new ads to the user when they’re online longer (e.g. on Instagram).

  • 23

    User engagement

    Yet another benefit related to the personalization of the online experience. Users tend to engage more with the available content when they’re served with it. It’s very logical when you think about it: it’s so convenient to click through “Related products” - it’s how we read articles, how we watch videos, how we shop. When a user has to search for each item separately, the chances of them giving up on the service go up.

    What’s more, with a recommendation system, you can drive more traffic to your website through custom emails or ads. User engagement doesn’t have to end at on-site activity, you can nurture the shoppers to convert them into your customers. How do you do it? You bring them back to your site. You may do it with a personalized email as Zalando does. If a user added an item to cart and then removed it and left the site, in a few days, they may receive a message with a discount for this particular product. Automated messages about abandoned carts are already obvious to most online retailers.

    Increased revenue

    Personalized customer experience, increased customer satisfaction and engagement all lead to more revenue. How? There’s a number of ways. First of all, with the help of a recommender system, the shopper can find items they like without having to look for them. They added one item to the cart and were already about to check out but then they saw a recommended product and went on browsing. This way, you increase the number of items per order and the average order value.

    If your service is subscription-based, you want your customers to stay with you and not leave for competition - to reduce churn, you personalize the offer. For telecoms, it means offering better deals for services that customers actually need and want or adding extra benefits - like a Netflix subscription at a lower price, or free of charge.

  • 24

    For entertainment services, like Netflix or HBO Go, it means suggesting the best content to suit the customers’ needs, so they don’t feel like there’s nothing to do on the site. Being served with suggestions of what to watch, they feel taken care of. Whatever products or services you recommend, the goal is to reduce churn and increase the customer lifetime value. And it works - after implementing their recommendation system, Amazon reported a 29% increase in sales, while Netflix reports that 80% of watched content is based on algorithmic recommendations.

    Making use of big data

    The data of your customers is your most valuable possession - but only if you can make sense of this data and make sure it’s actionable. Otherwise, you’re left with strings of random information. To even consider building a recommender system, you must have relevant data that the system can utilize, and most services already collect this data, storing information about purchase history, search phrases, clicked items, etc. All of these don’t make sense for a human - unless you want to dedicate a group of people to analyze hundreds of columns of a spreadsheet in hopes of finding patterns. To be honest, it’s just outside of human capabilities and would be a great waste of time. A recommendation system, however, being fed with the right information, will produce great suggestions, enriching the customers’ profiles.

    Recommender systems in practice: Netflix

    Many services aspire to create a recommendation engine as good as that of Netflix. The details of how it works under the hood are Netflix’s secret, but they do share some information on the elements that the system takes into account before it generates recommendations.

    On their website, they list what data they collect for the recommendations:

  • 25

    Whenever you access the Netflix service, our recommendations system strives to help you find a show or movie to enjoy with minimal effort. We estimate the likelihood that you will watch a particular title in our catalog based on a number of factors including:

    • your interactions with our service (such as your viewing history and how you rated other titles),

    • other members with similar tastes and preferences on our service, and

    • information about the titles, such as their genre, categories, actors, release year, etc

    In addition to knowing what you have watched on Netflix, to best personalize the recommendations we also look at things like:

    • the time of day you watch,• the devices you are watching Netflix on, and• how long you watch

    All of these pieces of data are used as inputs that we process in our algorithms. (An algorithm is a process or set of rules followed in a problem-solving operation.) The recommendations system does not include demographic information (such as age or gender) as part of the decision making process7.

    In an interview with Wired8, Todd Yellin, Netflix’s vice president of product innovation, compares the system to a three-legged stool:

    The three legs of this stool would be Netflix members; taggers who understand everything about the content; and our machinelearning algorithms that take all of the data and put things together.

    So the first leg of the stool is the users: what they watch, when they watch it. Netflix splits the users up into more than two thousand taste

    8 Wired: This is how Netflix's top-secret recommendation system works!7 Source: Netflix

  • 26

    groups. The system looks for similarities between users (collaborative filtering) to group them.

    The second leg is the content. Information about the content is gathered from dozens of in-house and freelance staff who watch every show on Netflix to tag it. There is a wide variety of tags to differentiate the types of content referring to the genre, setting, characters, etc.

    And the last leg is machine learning. The system is fed with data about content and user behavior, and sophisticated machine learning algorithms figure out what should be weighed - what’s the most important.

    Steps to follow while implementing a recommender system

    Step 1: Outline a recommendation strategy

    You can’t expect anything to go right if you approach it without a strategy. Ask yourself some questions:

    How often do you need to serve the recommended content?

    Real-time recommendations that take into account the most recent data are nice but more difficult to maintain. On the other hand, batch processing is easier to maintain (and in many cases perfectly sufficient) but does not reflect the recent changes in data.

    How will you handle the cold start problem?

    When a new customer starts using your platform, what will you recommend to them? The most common approach here is to serve them the most popular and the most recent content. That’s a good place to start learning what the user is interested in. However, you

  • 27

    can also start with a clever onboarding process that will collect some basic information about what the user is interested in. That’s the case with Netflix, where the first step is to choose some movies and series that you’ve already seen and enjoyed. In the case of e-commerce businesses, you can simply recommend items similar to those they have already displayed.

    Do you want more feed diversity?

    Recommender systems filter the content and may sometimes filter it too strictly. If a user interacted with one type of content for some time - let’s say they watched 3 thrillers in a row, flooding their feed with thrillers is some idea, but is it a good idea? You can add some layer of randomization, suggesting other items as well to introduce more diversity.

    Do you want to explain the recommendations?

    Some suggestions are marked as “Because you watched X” or “Because you follow Y”, or they’re accompanied by the accuracy rate - on Netflix, next to each recommendation, there’s a certain percentage. It’s great to be able to show how your system came up with the recommendation, but it’s not that simple. Many machine learning models are black boxes: after they’re trained, they generate recommendations but without giving the rationale behind the decisions. The explanation is often not necessary, but if the recommendations are not accurate, users may question the model’s decisions. There are approaches addressing this issue (read more about that in the chapter about challenges of AI adoption), but the simplest solution is to state a few rules that are used in generating recommendations. These rules should be understandable to users but not too general like “based on historical data”. Tell your users how you do it, in simple steps, like: we collect the information about your activity and preferences and compare it to the activity of other users to find similarities; we then use this information to recommend items that users with similar preferences enjoyed. Make sure you are clear about whether you process any

  • 28

    personal information.

    Step 2: Collect and organize relevant data

    You cannot have a recommendation engine without data. Whatever type of a recommender system you choose, data is a must. As you collect information, make sure it is organized in some standard form. Having all information in the same form makes it easier to compare user A to other users, or item A to other items. The more relevant datayou collect, the better predictions you get. That’s why some services have very specific sub-categories of products, like Netflix’s sub-genres.

    Step 3: Identify similarities

    Between users, between products, or both. Compare the users or items to identify patterns. This can be done with the use of clustering algorithms, for example the k-nearest neighbor (KNN) algorithm that recommends items that are closest to the ones users already liked. It’s the most intuitive machine learning algorithm - because it works similarly to how we give recommendations based on our knowledge of a person.

    Step 4: Track user interactions

    The content you serve to users as recommendations is what you assume they will be interested in. With sufficient data, it’s very probable they actually will enjoy the suggested items, but you can’t just sit down and relax once the recommendation engine is in place.Track user interactions to assess user engagement and the quality of predictions. If you use systems including likes, upvotes, or ratings, your customers can provide you with feedback that helps further improve the recommendation engine.

  • The power of big data and AI is that it signals the end of sampling and statistics - now you can just track the shopping pattern of every customer in every one of your stores around the world - and then respond almost instantly with discounts, changes in inventory, store layouts, etc.... and do so 24/7/365

    Scott Galloway: The Four - the hidden DNA of Amazon, Apple, Facebook, and Google

  • 30

    What is dynamic pricing?

    Dynamic pricing is a pricing strategy in which companies apply variable instead of traditional, fixed pricing. Prices are set in accordance with current market demands and as data is analyzed, the right prices are calculated. In dynamic pricing, different users can be charged a different amount for similar goods.

    Dynamic pricing is a must for e-commerce retailers to increase sales and generate more profit. With machine learning, companies can monitor and adjust prices more effectively. To make accurate pricing recommendations and sales predictions, algorithms analyze historical and competitive data. Dynamic pricing can boost profits by 25% on average.

    And how does it work? Pricing of airline tickets is a great example to illustrate that. When you’re buying a flight ticket online, you can sometimes notice that the price of the ticket changes, the very same ticket, same plane, same time, same seat. It can go up or down, depending on various factors. Airlines price tickets differently based on customer status and demand. It’s now obvious to us that some time is more expensive for travel - so flight tickets will cost more around Christmas, or even on a Friday evening compared to Wednesday morning. We’ve already learned that this is how it goes, but sometimes the price can change in an instant - like when you

    Dynamic pricing

  • 31

    add the ticket to cart. But airlines are not the only segment making more profit with the use of dynamic pricing; so do hotels, gas stations, financial institutions, and retailers. Dynamic pricing allows businesses to remain competitive while still capitalizing on market demand.

    What are the types of dynamic pricing?

    There are different types of dynamic pricing based on different factors.

    Segmented pricing

    Segmented pricing is a pricing strategy that offers different prices for different customers, for example in different geographical regions.

    Competition-based

    Competition-based pricing is a process of selecting appropriate price points in reference to the prices charged by competition. This strategy is often used by companies selling similar products.

    Limited supply

    That’s a pricing strategy often used by airlines or hotels. The prices are changing depending on the availability of products or services.

    Peak pricing

    Peak pricing is a strategy where customers pay more during the period of higher demand. Peak pricing is most frequently implemented by utility companies.

    Penetration pricing

    Penetration pricing is a strategy used to attract customers to a new product or service. It means setting a low initial price for a product

  • 32

    or service, often below the market rate. This strategy relies on the concept of low prices attracting a large portion of customers.

    Dynamic pricing can analyze a number of factors that you consider important to your pricing strategy. For some companies, analyzing just one aspect like the supply, demand or competitors’ prices is sufficient, but in other cases, pricing may rely on a number of factors to set the best possible price.

    Dynamic pricing - dos and don’ts

    There are good and bad reasons to increase prices, and sometimes higher demand is not the only factor that should influence the decision to make prices higher.

    What to do?

    Stay competitive. Dynamic pricing helps you be competitive 24/7. If you want to make sure you’ve got the best offer in town, you don’t have to spend hours going through the offers of your competition.Motivate customer behavior. You can encourage more customers to use your service during off-peak hours. This way, you can distribute the activity more evenly throughout the day and help avoid experiencing heavy surges.

    Be transparent about your pricing. Don’t use your pricing strategy against your customers. Make sure you let them know that the prices may change. You may be afraid of saying “the price may go up if …” - and sure, users can wonder why they may be paying more than someone else. But in this case, they know the rules and they accept them, while keeping people in the dark would be much like lying. And imagine your customers finding out that they’re charged differently for the same product or service without knowing it. They may get angry.

  • 33

    What not to do?

    Avoid too much price discrimination. It may be hard to tell where the line is, especially when the prices are calculated based on the actual demand. However, you need to make sure that your pricing strategy doesn’t hurt your customers and ruin your company’s image. And it can if it’s unfair.

    Uber uses a surge pricing model in their business. When demand for rides increases, prices go up. Riders wanting to order an Uber know about that - they can see a multiplier to the standard rates on the map. For example, if the multiplier is 1.8x, a ride that normally costs $10 will then cost $18. The rates are updated based on real-time demand, so the surge can change quickly. It’s a fair deal in general - on a Friday night, there may be more people who want to take an Uber than actual Uber drivers. That’s a good reason for prices to go up. However, there was a time when there was a snowstorm in New York, and taking an Uber was for many the only option to get back home. The surges were reportedly around the level of 4x, which is expensive but given the circumstances - acceptable. However, rides were even more expensive than that and some riders requested refunds. Though it was all due to higher demand, Uber was largely criticized for making profit on exploiting its customers.

    Dynamic pricing in practice: Amazon

    Amazon uses dynamic pricing and updates the prices of products 2.5 million times a day, which means that a product’s price changes about every 10 minutes as reported by Business Insider3. How do they do it?

    As described in the excerpt from “Swipe to Unlock: The Primer on Technology and Business Strategy”:

  • 34

    Amazon analyzes customers' shopping patterns, competitors' prices, profit margins, inventory, and a dizzying array of other factors every 10 minutes to choose new prices for its products. This way they can ensure their prices are always competitive and squeeze out ever more profit.

    What’s the result of this? Amazon often attracts customers with great deals on popular products, let’s say bestselling books. The price they offer will be lower than that of their competitors but they then increase prices on unpopular products. The idea behind this is that if consumers see discounted prices on most popular items, they will assume that Amazon generally has the best deals. And even if a user realizes that it’s not the best price they can get for some item, they may be willing to pay slightly more since they’re already shopping there. Why would you go to another platform to buy another book if you can get all you need in one place?The prices at Amazon change so often that they sometimes make shoppers feel frustrated - the price may differ in the morning when they browse the products and in the afternoon when they add it to cart. Or they may see the price of an item go down right after they’ve bought it.

    9 Business Insider : Amazon changes prices on its products about every 10 minutess

  • Many companies struggle with preventing churn due to ineffective retention strategies

  • 36

    Churn prediction is knowing which customers want to stop using your services. Churn is a major problem for many service providers, including the telecoms or entertainment platforms. Churn prediction is an extremely useful tool, giving insight into which customers have the intention of canceling their subscription, product or service. With this information, the staff can target at-risk customers with personalized offers and benefits to make them stay.

    How does predictive analytics improve customer retention?

    Many companies struggle with preventing churn due to ineffective retention strategies which are often based on randomly contacting as many of the customers whose contracts are about to end in the upcoming months as possible, and giving hefty discounts to those who have already canceled their subscription. The staff is missing information on which customers are most likely to churn and the factors influencing this risk. In our project for a large national telecom, the goal was to reduce churn by 2 percentage points in the segment where it was the highest. During the pilot, we managed to save our client $39k monthly. We reduced churn by 20%, and the client ended up with 10x return on investment.

    Churn prediction

  • 37

    Churn prediction in practice: major telecom company case study

    We used a churn prediction model for a major national telecom to help them deal with the problem of growing customer churn. Along the way, we have identified a number of problems that contributed to that state of business: the sales staff lacked information about which customers were likely to churn, they were contacting random customers with random offers, and the number of product bundles was so huge that the salespeople only used a small number of offers that were the easiest to sell. Using the old strategies and techniques proved highly ineffective and the company decided to make use of customer data to make well-informed decisions and reduce churn.

    The aim of our cooperation was to reduce churn by 2 percentage points in the customer segment where churn was the highest. Assuming the Average Monthly Recurring Revenue Per Customer (ARPC) of $15 and the Average Contract Duration of 18 months, that would translate into saving our client $500k a year if we delivered such results. Success also meant implementing our solution for the whole customer base.

    In just 2 months, we released the initial version of 360 view and churn models that we tested on a small sample of customers. That allowed us to optimize those models within 4 months since project inception – initial recommendations for customer retention campaigns based on our solution were available almost half a year earlier than planned. During the churn model pilot, we trained product recommendation models – that allowed us to increase campaign conversion rates almost twice, not only for churning customers but also for the whole segment.

    After additional 2 months of testing a combination of churn prediction and model recommendation, and with excellent results (see below), our client decided to roll our solution out for full customer base and introduce machine learning into other departments.

  • 38

    Churn by month NUMBER OF CUSTOMERS LOST EACH MONTH

    During the pilot, we were able to beat the goals almost twice, saving our client over $39k every month and much more than that after rollout – and that is not taking into account the cost of acquiring customers in place of those that left for competition. After full system rollout, our client ended up with more than 10x return on their investment.

    In addition to direct savings from the implementation of our solution, our client enjoyed other benefits, including improved focus on customers, company-wide AI implementation, and the introduction of Agile and DevOps.

    Getting started with predictive analytics

    When you’re considering implementing predictive analytics into your business, there are a few critical steps you should follow. Many teams use the Cross-industry Standard Process for Data Mining (CRISP-DM) approach in predictive analytics projects. CRISP-DM breaks the process of data mining into six phases:

  • 39

    • Business understanding• Data understanding• Data preparation• Modeling• Evaluation• Deployment

    This illustration shows the relations between phases of data mining - the process is not strict and linear, moving back and forth between phases is possible and often even required. And as you can see, it all starts with business understanding.

    So business understanding is the first step of data mining, but you need it far before you even start. Your data science project does not start with a model, it starts with… a problem. You got it right, and it’s logical, isn’t it? If you’re looking for a solution, there is a real problem behind it. So first, ask yourself:

    What problem do I want to solve with predictive analytics?

    It may be reducing churn, but that’s just one example. Predictive analytics is also used to cut down on employee turnover, predict

  • 40

    maintenance events, and for quality assurance, risk modeling or financial trends. So before you move on, name your problem.

    What do I want to predict?

    Sometimes the answer to this question is simple, sometimes you need to find the right factor influencing your situation. At this stage, you have to pose questions. It’s OK to have some crazy questions on the list, too - you may not be able to answer all of them anyway. If you write down a dozen questions and find answers to 3 of them - that’s perfectly fine. Why are you asking these questions? One thing is that this helps identify the right model to solve your problem, but there’s an organizational aspect of this process, too: it’s a demand for these questions to be answered. There’s no point in starting with a predictive analytics project if no one will use the insights it provides. Your staff, not the data scientists, should be asking these questions. Your staff knows best what information will help them do their job better. At this stage, you should also be able to answer this question:

    How will my staff use the insights?

    It’s a part of the process and it should be answered naturally when you consider what needs to be improved. Make sure that you know what to do with the outcomes delivered by your predictive model.

    The questions so far were focused on business understanding, and then it’s time to get a little more technical. If you’re able to answer questions about data and models - that’s great. But if you’re not, that’s also OK. There is one important thing that companies have to remember: they don’t have to carry out the projects internally. With the scarcity of data scientists, it would often pose another challenge along the way. Outsourcing these projects to companies specialized in AI is an alternative. And what’s important, a good vendor will be able to help you answer the questions that you don’t know how to answer yourself.

  • 41

    What data do I need?

    Once you know what predictions you need, you will have to identify the data for the model to generate predictions. It’s case-specific, so you don’t want to collect all the data you can - instead, focus on the data that is relevant to the case.

    What does “success” mean?

    It’s good to have expectations, as long as they’re realistic. Define what would have to happen for you to consider the data science project a success.

    These are the basic questions you should ask yourself or your staff before you move on. The data science team will take it from there, helping you find the most appropriate solution.

  • Data is the key elementof a predictive model

  • 43

    By now you surely know that data is the key element of a predictive model, or of any AI solution for that matter. Without data, models can’t learn, and if they can’t learn, they also can’t do their job. Artificial intelligence is inextricably connected with data science and big data. This is why data has been mentioned so many times in every chapter, and why it’s always the key element of preparing for any AI project.

    Let’s talk about data, then. Before you go any further, or if you want to have an understanding of the processes and requirements for that matter, you should understand what data is. Of course, data is information such as measurements, statistics, or demographics, but what are the types of data? Let’s define some of the names for various data types that you might hear.

    Types of data

    Raw data

    Raw data is completely unstructured and unprocessed, it comes directly from the source. It can be in the form of files, images, database records, or other. Raw data is extracted, processed, and used to drive conclusions.

    Why is it all about data?

  • 44

    Unstructured data

    Unstructured data refers to information that isn’t organized in a pre--defined way. It’s raw and unorganized and can be textual or non--textual. Unstructured data comes from documents, social media feeds, pictures and videos, audio recordings, sensors. It’s estimated that about 80% of data that organizations process is unstructured. Structured data

    Structured data is easily organized and typically stored in databases. Structured data is largely made up of data such as names, addresses, contact information - so information about customers. Examples of structured data can also include library catalogs or census records. It’s all the data you can easily imagine inside the rows and columns of a spreadsheet.

    Cooked data

    Cooked data refers to raw data that has been processed - extracted, organized, and perhaps analyzed.

    Why does data quality matter?

    It may seem obvious that you want to use “high-quality” data for your model. But what do you consider to be high-quality data?

    Data Management Body of Knowledge (DMBOK)10 defines data quality as follows:

    The term data quality refers both to the characteristics associated with high quality data and to the processes used to measure or improve the quality of data. These dual usages can be confusing, so it helps to separate them and clarify what constitutes high quality data.

    10 DAMA-DMBOK: Data Management Body of Knowledge

  • 45

    Data is of high quality to the degree that it meets the expectations and needs of data consumers. That is, if the data is fit for the purposes to which they want to apply it. It is of low quality if it is not fit for those purposes. Data quality is thus dependent on context and on the needs of the data consumer.

    Data quality dimensions include11:

    • Accuracy - Does the data correctly represent real life?• Completeness - Is all required data present?• Consistency - Are data values consistently represented?• Integrity - Includes ideas associated with completeness, accuracy,

    and consistency.• Reasonability - Does the data pattern meet expectations?• Timeliness - Is data up-to-date? What is the latency?• Uniqueness - Does each entity exist only once in the dataset?• Validity - Are data values consistent with a defined domain

    of values?

    Data quality is crucial because high-quality data is easily processed and analyzed to provide valuable insights. High quality of data is important in any data analytics-related efforts. High-quality data should be relevant to the needs of the organization and be capable of being understood. If you have a huge but chaotic dataset, reporting or modeling may be unable to make sense of it and generate insights. To get the right answers, you need the right data.

    Data quality is an important factor in building trust for the accuracy of outcomes provided by the model. Without this trust, the staff whose decision-making the outcomes are supposed to enhance may question the results or wonder how accurate they are. And as much as reasonable questioning is valuable and may lead to improvement, wondering whether the model is right or wrong should not be necessary. With trust in data comes the confidence to make well--informed data-driven decisions.

    11 As specified in DAMA-DMBOK: Data Management Body of Knowledge

  • 46

    Do I need a data strategy?

    The importance of data strategy is often underestimated. In the past, data was just a byproduct of processes and business activities. It’s different now. Data is the most valuable resource that allows companies to gain competitive advantage and come up with new ways of improving their operations. Many companies now appreciate the value of data - they use analytics, study trends, excel at reporting. However, fewer have adjusted to a more data-centered approach focused on capturing and managing data assets. Organizations need data strategies that cover the current state of business and technology but also future objectives.

    As defined by SAS:

    Data strategy is a plan designed to improve all the ways you acquire, store, manage, share, and use data.

    A data strategy ensures that the data collected by the company is actually managed like an asset. Data strategy includes elements of business strategy, goals for the project, data requirements, KPIs. Each data strategy may be different and consist of various elements, adjusted to the organization’s needs.

    The creation of a data strategy usually requires the analysis and planning of the following:

    Business case

    Your data strategy cannot work separate from your business strategy. You want to use data to drive business results, so the first step is to look at the business objectives and priorities. Then, you make a business case where you identify ways to use data to address these priorities. Your data strategy doesn’t have to cover all of the possible use cases you come up with. Focus on what’s doable. Select a few use cases to start with.

  • 47

    Objectives and quick wins

    What’s the long-term goal of your data science project? You probably know it already, and it’s a very important element of your strategy, but you also need quick wins - shorter-term goals. These should be fast and rather inexpensive, and they deliver value right away.

    Data requirements

    It’s time to answer some questions considering data.

    • What data do you need? Where will this data come from?• Is internal data enough or do you also need external data (e.g. from

    social media)?• What data do you already have?

    There are also questions related to data governance:

    • How to ensure data is stored in a secure way?• Who’s responsible for data-handling?• How to make sure your use of data is GDPR-compliant as well

    as ethical?

    And you should consider these issues, too:

    • How is data collected, stored, and organized?• Do you have an efficient data pipeline?• What technologies are you considering for your project and what

    are the technical requirements (like hardware, software)?• How will the results provided by the model be interpreted?

    Skills and know-how

    Apart from the technological aspects, you should also consider your team composition. Do you have the skills you need to deliver the project? Do you want to train your staff? Hire an in-house data science

  • 48

    team? Do you want to partner with another company?

    Core activities

    With the business cases selected, tech and staff requirements analyzed, you can outline the activities that have to be performed during the process. You don’t have to design a very detailed project roadmap, but identifying core activities will go together with identifying the skills and know-how you need in the project.

    KPIs and metrics

    Identify appropriate KPIs to verify whether your project is on track. Check these on short-term and long-term basis, and adjust if necessary. No strategy can be pursued without KPIs - a strategic approach cannot be lacking information about progress, success or failure, and relevant metrics.

    Data-driven culture

    Your staff will have to learn how to work with AI solutions and how to use the insights in their everyday work. You need to make sure it becomes their habit to make data-driven decisions. Data-driven organizations have processes that enable employees to acquire the information they need, but they also have clear rules on data access and governance.

    Data Strategy Components

  • 49

    Data science lifecycle

    With the data strategy in place, does a data science project look just like the development of an app? Not really. While general steps like business understanding, planning, prioritization, and development stay the same, we need to keep in mind that a data science project focuses on extracting knowledge from data - and in this way it’s much different from app development.

    Business understanding

    A data science project starts with business understanding. That’s the part that should be obvious to you by now, and you know it has to be considered before you even start with the project. Once you’ve identified the appropriate business use case and you know what problem you want to solve, you’re ready to move on to data.

  • 50

    Data mining

    At this point, you should already know what data is required for the model. With this knowledge, you’re able to collect and/or scrape the necessary data. Data mining focuses on gathering data from various sources.

    Data cleaning

    Data cleaning is the process of preparing data for analysis by altering data in a given dataset. During this stage, data that is irrelevant, incomplete, duplicated or improperly formatted is modified or removed. Data cleaning is performed to ensure that the dataset is accurate and correct.

    Data cleaning and preparation is one of the most time-consuming parts of the process. When you imagine the work of data scientists, you can think of all the amazing things they do while building the models. Or doing scientist stuff - which we all know (from movies) is writing important stuff on a glass surface. However, the results of a survey looking into the time devoted to particular stages of a data science project show that a large portion of time is consumed by data cleaning (see picture below).

    DURING A TYPICAL DATA SCIENE PROJECT AT WORK OR SCHOOL, APPROXIMATELY WHAT PROPORTION OF YOUR TIME IS DEVOTED TO THE FOLLOWING?

  • 51

    Data exploration

    Data exploration is the stage where you discover patterns in your data. It’s an approach similar to initial data analysis where visual exploration is used to understand what the dataset contains. Data exploration uses visualization because it’s a clear way to look at the dataset rather than going through a vast number of entries.

    Feature engineering

    Feature engineering is the process of using domain knowledge of the data to create features that make machine learning algorithms work. If feature engineering is done correctly, it increases the predictive power of machine learning algorithms by creating features from raw data that help facilitate the machine learning process - as defined by LetsLearnAI12.

    Predictive modeling

    It’s finally time to get on board with machine learning. At this stage, the model you’ve chosen for your project is created. The model is trained and its performance is evaluated.

    Communicating results

    How will the insights from the model be understood? The predictions generated by the model should be presented in a way understandable to those who are going to use these insights in their everyday work. The model may be working perfectly well, but it’s still useless if the results are not communicated in a clear way.

    When this is done, you can go through the process all over again. It’s a cycle, so it doesn’t end at communicating the results. What’s more, working with data needs timely adjustments. The model has to be maintained and evaluated.

    12 LetsLearnAI: What is feature engeneering for machine learning

  • It’s fair to say that a lot of people still don’t have a full understanding of artificial intelligence and its value, and that may cause them to ignore the topic overall as it seems too difficult to start

  • 53

    Whatever model you’re implementing, AI adoption is not like setting up an internet connection at the office, it’s still a more complicated process. There are some important factors you must consider when implementing AI. At the beginning of the process, you should follow some basic steps to make sure you are well-prepared. Below is a summary of the steps to follow when you’re implementing any AI solution in your organization. You may notice that these correspond to the elements of the data strategy - ultimately, the steps you follow and the questions you answer should allow you to create the strategy more easily and in a well-informed manner.

    How do I get ready for AI adoption?

    Explore the options

    See what options are out there and don’t blindly follow the tech giants. Unless you’re the next Amazon, you most probably don’t need the same AI tech they use. The more you know about AI’s possibilities and limitations, the better you understand its potential role in your organization.

    Get the staff ready

    Most barriers to successful AI adoption are human-related. There are

    AI implementation - why is it so difficult?

  • 54

    cultural barriers such as resistance to change, fear of the unknown – or the even stronger feel of failure, shortage of talent, and the lack of strategic approach towards AI adoption. In general, it’s fair to say that a lot of people still don’t have a full understanding of artificial intelligence and its value, and that may cause them to second-guess AI implementation or even ignore the topic overall as it seems too difficult to start. It’s not that hard, though, provided you do it step by step.

    Identify the problem(s) you want AI to solve

    The first step to getting ready is to ask: Do you have a process that is inefficient or has high human error? Once you identify the right issue, you can check whether there are AI solutions that you can use. In an ideal case, AI is used to improve inefficient human processes and thus staff retraining may be also required. What issues should I look out for?I will say look at use cases where the following are true: 1. Distributed data, 2. High human error within the existing process, 3. Large amounts of data can be generated or collected. This will give you a good idea on what issues to look out for to use AI/ML.

    Rudradeb Mitra, "Creating Value with Artificial Intelligence", a mentor of Google Launchpad and a senior AI advisor of ECMA banking group; https://www.linkedin.com/in/mitrar

    AI is a great augmentation to human work and, if applied correctly, will optimize processes in order to help you achieve set objectives. First, you need to identify the area that it can improve. The AI use case should address a specific pain, be it an internal one within your organization or one of your customers. At this early stage, it’s good to focus on the things that AI has already mastered, such as prediction, automation or classification. Now, think about a part of your business that could use that. You can predict e.g. customer behaviors and preferences, demand for products, prices of resources.

  • 55

    Know your data

    Data has been mentioned so many times already, but for a reason - it’s an essential part of AI. No model will give you predictions if it’s not fed with large volumes of relevant data. If you don’t know what types of data your company collects – check it. It can be information collected through your service (e.g. customers’ purchase history, demographic data, on-site interactions, etc.), excel data files (xls or csv) containing information about your recent sales, services, and orders, or any other data, including that from your CRM, ad campaigns, email lists, traffic analysis, social media or even public information, e.g. about your competitors or the prices of resources. Before you start working with AI, you need to know what kind of data you’re dealing with.

    Here’s a short checklist on the things you should verify when it comes to data:

    • Check the type of data that you have – is it structured or unstructured?

    • Specify what data has been collected about the users: demographic, purchase history, on-site interactions, helpdesk contact, other.

    • Identify how to find high-quality data (what you have may not be enough now).

    • Categorize data by adding metadata, tags, etc. Or is the data already categorized?

    • Focus on the right data – don’t just collect all the information there is, collect and access the data that is important to you.

    And don’t forget your data strategy. It’s an essential part of a data science project.

    Start small and test your assumptions

    Implementing a company-wide AI strategy takes time, effort, and money. Instead, choose one segment to test AI in and start with a

    13 O'Reilly: AI adoption in the enterprise

  • 56

    proof-of-concept to validate the idea. This implies a much smaller risk of failure as it’s easier to build an efficient system piece by piece and it leaves room for improvement in each iteration. At the same time, you can quickly see what AI brings: what model parameters were achieved, what the results provided by the predictive model are and how they can be interpreted.

    What issues should I look out for?

    Even though AI is developing and gaining more popularity, many businesses still can’t find their way with this “new” technology. Why? There’s a number of reasons why a company may fear AI implementation. O’Reilly published an ebook13 summarizing the findings of their surveys concerning AI adoption in enterprises and listed some of the most common factors that hold back further AI implementation. 23% of respondents say that the main reason they haven’t further adopted AI is the fact that their company culture doesn’t recognize the needs for AI. Other reasons include lack of data and lack of skilled people, and difficulties identifying appropriate business cases, among others.

    What is the main bottleneck holding back further AI adoption? ( SELECT ONE )

  • 57

    What challenges do companies face when implementing AI?

    As you can see above, some of the common problems mostly include those related to people, data or business alignment. While every company is different and will experience the process of AI adoption in a different way as well, there are certain hurdles you should be aware of.

    In this chapter, we’ll cover some of the most common challenges related to data, people, and business. These challenges are:

    Data

    • Data quality and quantity• Data labeling• Explainability• Case-specific learning• Algorithmic bias• Dealing with model errors

    People

    • Lack of understanding of AI among non-technical employees• Scarcity of field specialists

    Business

    • Lack of business alignment• Difficulty assessing vendors• Integration challenges• Legal issues

  • 58

    Data

    The data-related issues are probably the ones most companies are expecting to have. It’s a known fact that the system you build is only as good as the data that it’s given. Since data is the key element of AI solutions, there’s a number of problems that can arise along the way.

    Data quality and quantity

    As mentioned above, the quality of the system relies heavily on the data that’s fed into it. AI systems require massive training datasets. Artificial intelligence learns from available information in a way similar to humans, but in order to identify patterns, it needs much more data than we do. It makes sense when you think about: we’re also better at tasks we have more experience performing. The difference is that AI can analyze data with a speed we as humans can’t even dream of, so it learns fast. The better data you give it, the better outcomes it will provide.

    How can you solve the data problem? First of all, you need to know what data you already have and compare that to what data the model requires. In order to do that, you need to know what model you’ll be working on – otherwise, you won’t be able to specify what data is needed. List the types and categories of data you have: is the data structured or unstructured? Do you collect data about your customers’ demographics, purchase history, on-site interactions, etc? When you know what you already have, you’ll see what you’re missing.

    The missing parts may be some publicly available information that the system will have easy access to, or you may have to buy data from third parties. Some types of data may still be difficult to obtain, e.g. clinical data that would allow more accurate treatment outcomes predictions. Unfortunately, at this point, you have to be prepared that not all types of data are easily available. In such cases, synthetic data comes to the rescue. Synthetic data is created artificially based on real data or from scratch. It may be used when there isn’t enough data available

  • 59

    to train the model. Another way to acquire data is to use open data as an addition to your dataset or use Google dataset search to get data to train the model. You can also use an RPA robot to scrape publicly available data, e.g. information published on Wikipedia. When you know what data you have and what data you need, you will be able to verify what ways of expanding datasets work best for you.

    Data labeling

    A few years back, most of our data was structured or textual. Nowadays, with the Internet of Things (IoT), a large share of the data is made up of images and videos. There’s nothing wrong with that, and it may seem like there’s no problem there, but the thing is that many systems utilizing machine learning or deep learning are trained in a supervised way, so they require the data to be labeled. The fact that we produce vast amounts of data every day doesn’t help either; we’ve reached a point where there aren’t enough people to label all the data that’s being created. There are databases that offer labeled data, including ImageNet which is a database with over 14 million images. All of them manually annotated by ImageNet’s contributors. Even though in some cases, more appropriate data would be available elsewhere, many computer vision specialists use ImageNet anyway only because their image data is already labeled.

    There are a few data labeling approaches that you can adopt. You can do it internally, within your company, or outsource the work, you can use synthetic labeling or data programming. All of these approaches have their pros and cons, as presented in the table below.

  • 60

    DATA LABELING APPROACHES

    APPROACH

    Internal labeling

    External team

    Outsourcing / crowdsourcing

    Data programming

    Feedback loop system

    An in-house data science team handles all the data labeling

    You hire a specialized company to handle the project

    You hire temporary employees or cooperate with freelancers from freelance / crowdfunding / job search platforms

    You use scripts that automatically label data

    You pre-define some rules that allow the system to perform labeling and have people confirm or dismiss the suggestions. Feedback is used for further labeling

    time- consuming

    expensive

    high accuracy of labeled data

    control over the process: you know what results to expect, you can track progress

    the data stays in your company

    cost savings

    fast delivery

    you choose the candidates, so you can assess their skills

    fast results thanks to automation

    faster results compared to manual work

    higher accuracy of predictions thanks to feedback from humans

    you can't track progress

    expensive

    you don't choose who your team is made up of ( skills, experience, etc. )

    you can't track progress

    expenisve

    you don't choose who your team is made up of ( skills, experiance, etc. )

    lower accuracy of labeled data

    still requires human involvement to verify whether the labeling is correct

    lower quality of initial predictions, improves over time

    high accuracy of labeled data

    predictable result: the company is bound by a contract, so they have to deliver satisfying outcomes by deadline

    HOW IT WORKS PROS CONS

  • 61

    Explainability

    With many “black box” models, you end up with a conclusion, e.g. a prediction, but no explanation to it. If the conclusion provided by the system overlaps with what you already know and think is right, you’re not going to question it. But what happens if you disagree? You want to know HOW the decision has been made. In many cases, the decision itself is not enough. Doctors cannot rely solely on a suggestion provided by the system when it’s about their patients’ health.

    Approaches such as LIME (local interpretable model-agnostic explanations) aim to increase the transparency of models. So if AI decides that a patient has the flu, it will also show which pieces of data led to this decision: sneezing and headaches, but not the patient’s age or weight, for example. When we’re given the rationale behind the decision, it’s easier for us to assess to what extent we can trust the model.

    Local Interpretable model-agnostic explanations

    Marco Tulio , Ribeirosameer, Singhcarlos Guestrin

  • 62

    Case-specific learning

    Our intelligence allows us to use the experience from one field to a different one. That’s called the transfer of learning – humans can transfer learning in one context to another, similar context. Artificial intelligence continues to have difficulties carrying its experiences from one set of circumstances to another. On one hand, that’s no surprise – we know that AI is specialized – it’s meant to carry out a strictly specified task. It’s designed to answer one question only, and why would we expect it to answer a different question as well? On the other hand, the “experience” AI acquires with one task can be valuable to another, related task. Is it possible to use this experience instead of developing a new model from scratch? Transfer learning is an approach that makes it possible – the AI model is trained to carry out a certain task and then applies that learning to a similar (but distinct) activity. This means that a model developed for task A is later used as a starting point for a model for task B.

    Algorithmic bias

    Bias is something many people worry about: stories of AI systems being “prejudiced” against women or people of color make the headlines every once in a while. But how does that happen? Surely, AI cannot have bad intentions. Or can it…?

    No, it cannot. An assumption like that would also mean that AI is conscious and can make its own choices when in reality AI makes decisions based on the available data only. It doesn’t have opinions, but it learns from the opinions of others. And that’s where bias happens.

    Bias can occur as a result of a number of factors, starting with the way of collecting data. If the data is collected by means of a survey published in a magazine, we have to be aware of the fact that the answers (data) come only from those reading said magazine, which is a limited social group. In such a case, we can’t say that the dataset is representative of the entire population.

  • 63

    The way data is probed is another way to develop bias: when a group of people is using some system, they may have their favorite features and simply not use (or rarely use) other features. In this case, AI cannot learn about the functions that are not used with the same frequency.

    But there is another thing we have to consider in terms of bias: data comes from people. People lie. People spread stereotypes. This happened in Amazon recruitment when their AI recruiter turned out to be gender-biased. Since men dominated the workforce in technical departments, the system learned that male applicants are favorable and penalized the resumes that included the word “women’s”.

    A report from Reuters, published in October 2018, stated that Amazon scrapped its internal project that used AI to review resumes and make recommendations. The company’s rating tool used AI to give candidates scores from 1 to 5 stars. It could get hundreds of resumes, review them and just spit out the top 5 that you can hire. But here comes our fail: by 2015, the company realized that the system did not grade the candidates for technical positions in a gender-neutral way.

    As Reuters writes:

    That is because Amazon’s computer models were trained to vet applicants by observing patterns in resumes submitted to the company over a 10-year period. Most came from men, a reflection of male dominance across the tech industry.

    As a result, the system learned that male candidates are preferable and it disrated female candidates: it penalized resumes that included the word “women’s” and it downgraded graduates of two all-women’s colleges. The company later edited the program to make it gender neutral. Amazon stated that the tool was never used by its recruiters to evaluate candidates.

  • 64

    DOMINATED BY MENTop U.S. tech companies have yet to close the gender gap in hiring, a disparity most pronouncaed among technical

    staff such as software developers wheremen far outnumber women. Amazon's experimental recruiting engine

    followed the same pattern, learning to penalize resumes including the word "women's" until the company discovered

    the problem.

    Note: Amazon does not disclose the gender breakdown of its technical workforce.

    Source: Latest data available from the companies since 2017

    by Han Huang I Reuters Graphics

    Dealing with model errors

    Artificial intelligence is not error-free. Human prejudices (or lies) seep into its algorithms and sometimes the results are biased. As mentioned above, there are various reasons why datasets are biased. Any issues like that can cause AI to produce inaccurate outcomes, e.g. predictions.

    “Bad reasoning” is another common cause of AI’s mistakes. As AI systems get more and more advanced, it can also get increasingly difficult to understand the processes in the network. So when an AI system makes a mistake, it may be difficult to identify the exact place where something went wrong. And what if the decision is about

    14 Cornell University: Automated Whitebox Testing for Deep Learning Systems

  • 65

    an autonomous car making a sharp turn or running someone over? Luckily, scientists developed Whitebox Testing14 for Deep Learning Systems. It tests the neural network with a large number of inputs and tells where its responses are wrong so they can be corrected.But are the mistakes made by AI always so dangerous? Not always, certainly. That all depends on the use of the system. If AI is used for cybersecurity, military use, driving vehicles – more is at stake. If the system chooses a man over a woman that is as skilled, it’s an ethical issue. But sometimes the mistakes are just silly – as shows the 2015 Wired article15 where they describe AI that was shown an image of black and yellow stripes. And it decided it’s a school bus. It was 99% sure it was right. Only it really wasn’t right at all.

    To make sure that the errors produced by AI are not critical, we must ensure high quality of input and appropriate testing.

    People

    Lack of understanding of AI among non-technical employees

    AI implementation requires the management to have a deeper understanding of current AI technologies, their possibilities and limitations. Unfortunately, we’re surrounded by a plethora of myths concerning artificial intelligence, ranging from mundane things like the need of hiring an in-house data science team (who, you should know, only work for Facebook, Amazon, and Google, so how do you even compete) to sci-fi fantasies about smart robots ending humanity. The lack of AI know-how hinders AI adoption in many fields. Another common mistake that is caused by the lack of understanding is working towards impossible goals.

    How to solve this problem? Start with education. I know, it may sound discouraging but I don’t mean you have to become a data scientist. Just have a look around your industry, watch some big players, see what use cases they’ve deployed. Learn about the current possibilities

    15 Wired: Simple pictures that state-of-the-art AI still can't recognize

  • 66

    of artificial intelligence, you can do it yourself or ask an expert in the field to help you out. Once you have some knowledge, it’ll be easier for you to manage your expectations because you’ll know what AI can and cannot yet do for your business.

    AI models tend to be complex and hard to understand, they seem to be a sort of "black box", where data goes in and answers come out. People who are supposed to work with these models are often puzzled about how any of the results were reached.

    That's why it is important to make AI and the way it is used transparent and interpretable - both for the executives, who are going to make decisions based on its recommendations, and for any employee whose job will depend on the operations performed by AI. In these terms, education should be considered a crucial part of AI implementation.

    It is important to understand that AI is primarily not there to replace people but to assist or to augment them. We believe that the combined power of man and machine is better than either one on its own.

    Dr. Christian B. Westermann, Leader Data & Analytics, Partner PwC Switzerland

    Scarcity of field specialists

    In order to develop a successful AI solution, you need both the technical knowledge and business understanding. Unfortunately, it’s often one or the other. CEOs and managers lack the technical know--how necessary for AI adoption, while many data scientists aren’t very interested in how the models they develop will be used in real life. The number of AI experts that will know how to apply the tech to a given business problem is very limited. So is the number of good data scientists in general.

    Companies outside the FAMGA group (Facebook, Apple, Microsoft,

  • 67

    Google, Amazon) are struggling to attract top talent. And even if they’re attempting to build an in-house team, they aren’t sure whether they’re getting the right people. You can’t really know whether they deliver top-quality solutions if you’re lacking the technical knowledge. Small and medium enterprises may fall short on the idea of AI adoption because of their limited budget. However, outsourcing a data science team is now an option as well.

    Business

    Lack of business alignment

    As shown on the chart from O’Reilly at the beginning of this chapter, company culture not recognizing needs for AI and difficulties in identifying business use cases are among the top barriers to AI implementation. Identifying AI business cases requires the managers to have a deep understanding of AI technologies, their possibilities and limitations. The lack of AI know-how may hinder adoption in many organizations.

    But there’s another problem here. Some companies jump on the AI bandwagon with too much of opt