What is Big Data? - AMA San Diego€¦ · Structured Data Analysis: Classification Classification...
Transcript of What is Big Data? - AMA San Diego€¦ · Structured Data Analysis: Classification Classification...
Q2 Insights Presentation to San Diego AMA Art of Marketing 2015
Speaker: Kirsty Nunez
Big Data
January 16, 2015
2
What is Big Data?
3
The exact definition of Big Data is open to interpretation. Fuzzy thinking about Big Data goessomething like this – It’s something big, it also has to do with data.
Big Data
Any collection of data sets SO LARGE and complex that it becomes difficult to process them using traditional data
processing applications
Large data sets that may be analyzed to reveal patterns, trends, and
associations, especially relating to human behavior and interactions
Analysis is usually planned and may involve multiple
sources of data (data sets)
What is Big Data?
By 2015 4.4 million IT jobs will be created globally to support big
data, with 1.9 million in the US
Source: http://www.ibmbigdatahub.com/enlarge-infographic/1642
4
Big Data is a Rapidly Growing Industry
Source: Big Data Universe v3.. Matt Turck, Sutian Dong & FirstMark Capital, 2013
5
Volume Velocity Variety Veracity
The four key characteristics of Big Data are:
The Four V’s of Big Data
Source: http://www.ibmbigdatahub.com/enlarge-infographic/1642
6
40 Zettabytes (43 Trillion Gigabytes) of data will be created
by 2020, an increase of 300 times from 2005
It is estimated that 2.5 Quintillion Bytes (2.3 trillion
Gigabytes) of data are created each day
Most companies in the US have at least 100 Terabytes(100,000 Gigabytes) of data stored
6 billion people have cell phones (world population: 7
billion)
Volume
VolumeScale of Data
Source: http://www.ibmbigdatahub.com/enlarge-infographic/1642
7
The New York Stock Exchange captures 1 Terabyte of Trade
information during each trading session
Modern cars have close to 100 sensors that monitor items
such as fuel level and tire pressure
By 2016, it is projected there will be 18.9 Billion network
connections (almost 2.5 connections per person on earth)
Velocity
VelocityAnalysis of Streaming
Data
Source: http://www.ibmbigdatahub.com/enlarge-infographic/1642
8
Variety
As of 2011, the global size of data in healthcare was estimates
to be 150 Exabytes (161 Billion Gigabytes)
By 2014, it is anticipated there will be 420 million,
wearable, wireless health monitors
400 million tweets are sent per day by about 200million monthly active users
30 billion pieces of content are share on Facebook every
month
VarietyDifferent Forms of
Data
Source: http://www.ibmbigdatahub.com/enlarge-infographic/1642
9
One in three business leaders do not trust the information
they use to make decisions
Poor data quality costs the US economy around $3.1 trilliona year
27% of respondents in one survey were unsure of how much
of their data was inaccurate
Veracity
VeracityUncertainty
of Data
Source: http://www.ibmbigdatahub.com/enlarge-infographic/1642
10
Big Data Veracity refers to the biases, noise and abnormality in data. Is the data that is being stored,and mined meaningful to the problem being analyzed?
Veracity
Source: http://www.ibmbigdatahub.com/enlarge-infographic/1642
Data Accuracy
Data Fidelity
Data Truthfulness
11
Types of Big Data
12
Types of Big Data
Structured Data
Unstructured Data
13
Structured Data
Relational Databases
Spreadsheets
Within a Record or File
A data model is necessary to create structural data; this includes defining what will be stored andhow.
Data in a Fixed Field
14
Structured Data
Structured data is generated by computers or machines and humans.
HumansMachines
Sensory Data
Web Log Data
Point of Scale Data
Financial Data
Input Data
Click-stream Data
Gaming - Related Data
15
Structured Data
Structured data is generated by computers or machines and humans.
HumansMachines
Satellite Images
Scientific Data
Photographs and Video
Radar or Sonar Data
Text Internal to a Company
Social Media Data
Mobile Data
Website Content
16
Unstructured Data
Text documents Email Video
Audio Stock Ticker DataFinancial
Transactions
Unstructured data is growing faster than structured data and it is predicted to account for 90% of alldata created this decade.
17
Uses of Big Data
18
Uses of Big Data
Customer EngagementCustomer Retention and
LoyaltyMarketing Optimization
Profiling Consumers Tailored AdvertisingHow to Gain A
Competitive Edge
More Strategic, Actionable Insights
Identify Valuable Marketing Opportunities
Monitoring Google Trends
Clearly Define Your Audience
Create Real-Time Personalization to Buyers
Identify Specific Content that Moves Buyers Down
the Sales Funnel
Tailored PricingTracking and Extracting Meaning From Social Network Information
Retail Habits
19
Big Data Analysis Applications
Big Data Analytics
Social Media Analytics
Online Advertising
Display Marketing
Test AnalyticsRetail
AnalyticsCustomer Analytics
Forecasting
Pricing and Revenue
OptimizationPredictive Modeling
Custom Insights
Custom Reporting
Custom Dashboards
20
Big Data Analysis
21
Statistics Econometrics Machine Learning
Data MiningArtificial
IntelligenceOperations Research
Natural Language Processing
Data Sciences Involved in Big Data Analysis
22
Structured Big Data
Big Data Analysis Road Map
Unstructured Big Data
Predictive AnalysisTracking and Extraction of
Meaning
Classification
Regression
ClusteringPattern
Recognition / Clustering
Big Data
23
Structured Data Analysis
Big Data is increasingly utilized in an attempt to predict consumer behavior. This is called PredictiveAnalytics. Predictive models exploit patterns in historical and transactional data to identifyopportunities and risks.
A mortgage company uses their data to
generate a list of good loan candidates
Amazon predicts customer product
interests based on past behavior
24
Structured Data Analysis Techniques
Classification Regression Clustering
25
Structured Data Analysis: Classification
Classification is a data mining application where the variable of interest, the variable we want topredict, is categorical in nature.
Categorical data is used to distinguish between groups. Classification data mining techniques often take on descriptive and predictive aspects.
For example, we might want to find new categories of behavior that are strongly related to the main variable of interest.
Gender
Age Group
Location
26
Structured Data Analysis: Regression
Regression analysis is a tool used to understand the effect of one variable onanother variable and understanding relative strengths of effects. An examplewould include:
The goals of a regression problem are similar to that of a classification project.We would like to find the best predictors related to the variable of interest.
Determining the effect on sales if prices were increased by 5%
27
Structured Data Analysis: Clustering
Clustering has quite a different goal than classification or regressions. With clustering, a variable ofinterest does not exist. Instead, we attempt to sort the data into clusters. Example include:
Cluster individuals for a marketing
campaign
Cluster products purchased based
on customer survey responses
The Netflix model clusters
customers into movie categories
and makes recommendations
based on their movie watching
history
28
Unstructured Data Analysis
To make sense of unstructured data, different methodologies are employed to identify patterns orclusters in the data including: Data Text Mining, Text Analysis, Neural Network Analysis and SocialNetwork Analysis. For example, Social Network Analysis might be deployed to:
See who is talking about the
brand
Determine who are major
influencers or connectors and what they are
saying
Understand not only what is
being said in the social media
sphere but also identify the most
efficient messengers
Produce social network maps or hyperlink maps
29
Big Data Analysis
Structured Data Unstructured Data
Classification
Regression
Pattern Recognition / Clustering
Other
Pattern Recognition / Clustering
Factor Analysis
Principal Components Analysis
Linear / Non-Linear Regressions
Logistic Regression
Neural Network Analysis
Clustering
Data Text Mining
Text Analysis
Neural Network Analysis
Social Network Analysis
A / B Testing
Time Series Analysis
Optimization
Structural Equation Models
Discrete Choice Models
30
One Approach to Big Data Analysis: Data Mining
31
Data Mining and Big Data
Data Mining is an analytic
process
Designed to explore data
In search of consistent
patterns, and / or
relationships
And then to validate
findings by applying to
new subsets of data
32
Data Mining
To decipher Big Data, we have to data mine. Data mining is a powerful set of methodologies thatwhen successfully applied, will:
Increase business revenue
Cut costsOther actions
to improve the bottom line
33
Ultimate Goal of Data Mining
Predictive data mining is the most common type of data mining and one that has the most directbusiness applications.
Prediction
34
Big Data and Prediction
Modeling Customer actions and interactions
Predicts
Statistical Techniques Market size for a product or service
Analytics Market share, products and pricing
Brand Perceptions Brand Choice
Discrete Choice Conjoint Techniques
Best mix of products or services
Advanced Analytics and Modeling
Specific facets of a company’s customers
35
Model Building ModelingPredictive Data
Mining
Exploration
Deployment
Stages of Data Mining
The process of data mining consists of three stages:
Business Understanding
Data Understanding
Data Preparation
Evaluation
Validation
Implementation
Exploration: Business Understanding
Outline Project Objectives and Requirements
Convert Objectives and Requirements into a Data Mining Problem Definition
Using the Problem Definition, a preliminary plan to achieve the business objectives is developed
Exploration: Data Understanding
Before working with any data set, the data mining expert must become familiar with the data by:
Identifying data quality problems
Detecting preliminary insights into the data
Exploring the possibility of interesting data subsets in which useful information may be hidden
Exploration: Data Preparation
A final dataset is developed that will be used during modeling. Raw data may be manipulatedmultiple times to achieve the final data set. Techniques in this phase include:
Based on the analytic problem the process of data mining ranges from straightforward predictors fora regression model to exploratory analyses using a variety of graphical and statistical methods toidentify the most relevant variables and determine the complexity and / or nature of models.
Tabling RecordingAttribute Selection
Transformation of data
Cleaning of Data
Model Building: Validation
Modeling techniques are dictated by the nature of the data. Typically, several modeling techniqueswill be employed. As some techniques have specific data format requirements on the form of data,returning to the data preparation phase is often required.
The process of considering various models and choosing the best one based on predictive performance
Model Building: Predictive Data Mining
There are a variety of techniques developed to achieve validation - many of which are based on“competitive evaluation of models," which applies different models to the same data set and thencompares performance to choose the best.
Bagging (Voting,
Averaging)Boosting Stacking
Meta-Learning
41
Deployment
The application of the model to new data in order to generate predictions
Evaluation and Implementation
Data mining and the resulting knowledge are powerful tools. This process canprovide marketers with actionable knowledge to inform and drive marketingstrategy which in turn can have significant impact on business profitability.
Before deploying a model it is critical to thoroughly evaluate and review the steps executed to be certain it addresses business objectives.
The knowledge gained from the modeling must be organized and presented in a way that management can understand.
Deployment can be as simple as generating a report or as complex as implementing a repeatable data mining process.
43
Big Data and Market Research
44
Big Data and Market Research
Marketers need to ensure their insights and results are valid and determine whether they are beingapplied in a valid manner. Whenever we need to gather information, there are essentially fivequestions we answer.
Who What When Where Why
45
Big Data and Market Research
Four components of the Five W’s are provided by structured Big Data and other data sources.
Who
What
When
Where
The who question identifies the various players in a problem or solution.
The what question tries to ascertain what consumers are buying, trends, and services used.
The when question considers various time based events and activities such as when customers are buying products or services, e.g. day part, date range, or life stage, etc.
The where question addresses geographic and/or logistical aspects of a solution.
46
Big Data and Market Research
With the increasing prevalence and accessibility of Big Data, businesses are already provided theWho, What, When, and Where. But Big Data cannot provide the Why. This is where MarketResearch comes in. As long as humans continue to be inconsistent, impulsive, dynamic, and subtle,breakthroughs solely dependent on Big Data will be elusive.
The emotional, rational, and irrational drivers of customers cannot be explained by Big Data. And asthe prevalence of Big Data increases, the number of questions that are raised will increase; thesequestions are best addressed by traditional market research.
Who What When Where Why
47
Answering the why question is most effectively achieved by the integration of quantitative andqualitative research methodologies with qualitative being used to answer the why question andquantitative being used to verify and quantify the findings.
Applied appropriately, these methods will result in a collection of textual, visual and oral data thatwill need to be analyzed through textual analysis. This qualitative analysis provides insight intocustomers’ attitudes, behaviors, and their thought processes.
Focus Groups In-Depth InterviewsObservation
(Ethnography)
Social Networks Guided Online Chats
Qualitative Methods to Answer Why
48
Big Data and Market Research
Nothing beats knowing why people make the choices
they do.
Big Data finds the patterns,
market researchers test the hypotheses.
49
While Big Data is sometimes touted as the magic bullet to address all market research questions, itis not the answer to all questions and insights to be obtained. Big Data has its place in the array ofmarket research methodologies and is an ever growing presence.
Big Data and Market Research
Before Big Data, primary research conducted by
market researchers focused on what was happening.
Now with that requirement increasingly solved by Big Data, market researchers
can focus on why there are deviations from trends.
50
Thank You
Q2 Insights, Inc.
San Diego2236 Encinitas Blvd., Suite F
Encinitas, CA 92024Phone: 760-230-2950
Fax: 760-230-2951
New Orleans1070-B West Causeway Approach
Mandeville, LA 70471Phone: 985-867-9494