Dan Rossner - Demystifying Big Data. Real world applications of Big Data
Data-Ed: Demystifying Big Data
-
Upload
data-blueprint -
Category
Technology
-
view
399 -
download
5
description
Transcript of Data-Ed: Demystifying Big Data
Copyright 2013 by Data Blueprint
Demystifying Big Data
Date: May 14, 2013Time: 2:00 PM ET/11:00 AM PTPresenter: Peter Aiken, Ph.D.
• Every century, a new technology-steam power, electricity, atomic energy, or microprocessors-has swept away the old world with a vision of a new one. Today, we seem to be entering the era of Big Data– Michael Coren
• Every century, a new technology-steam power, electricity, atomic energy, or microprocessors-has swept away the old world with a vision of a new one. Today, we seem to be entering the era of Big Data– Michael Coren
1
Copyright 2013 by Data Blueprint 2
Live Twitter Feed @datablueprint @paiken #dataed
Like Us www.facebook.com/datablueprint Join the Group Data Management & Business Intelligence
Get Social with Us!
Presented by Peter Aiken, Ph.D.
Demystifying Big Data 2.0Developing the Right Approach for Implementing Big Data Techniques
Copyright 2013 by Data Blueprint 4
Peter Aiken, PhD• 30+ years of experience in data
management• Multiple international awards &
recognition• Founder, Data Blueprint (datablueprint.com)
• Associate Professor of IS, VCU (vcu.edu)
• Past President, DAMA International (dama.org)
• 9 books and dozens of articles• Experienced w/ 500+ data management
practices in 20 countries• Multi-year immersions with
organizations as diverse as the US DoD, Nokia, Deutsche Bank, Wells Fargo, and the Commonwealth of Virginia
2
Copyright 2013 by Data Blueprint
Outline
• Big Data Context: Why the Big Deal about Big Data?
• Big Data Challenges: Historical Perspective
• Big Data Challenges: Today• Big Data Approach: Crawl, Walk, Run• Design Principles:
Foundational & Technical• Take Aways and Q&A
5
Copyright 2013 by Data Blueprint
Why the Big Deal about Big Data?
6
• We are at an inflection point: The sheer volume of data generated, stored, and mined for insights has become economically relevant to businesses, government, and consumers (McKinsey)
• We believe the same important principles still apply:
– What problem are you trying to solve for your business? Your solution needs to fit your problem
– Doing data for (big) data’s sake is not going to solve any problems
– Risk of spending a lot of money on chasing Big Data that will realize little to no returns - especially at this hype cycle stage
http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation?p=1
Copyright 2013 by Data Blueprint
Myth #1: Everyone should invest in Big Data
Fact:• Not every company will benefit
from Big Data• It depends on your size and your
ability– Local pizza shop vs. state-wide or
national chain
7
Copyright 2013 by Data Blueprint
Big Data can create significant financial value across sectors
8
• Some (not all) companies can take advantage of Big Data to create value if they want to compete
Copyright 2013 by Data Blueprint
5 Ways in which Big Data creates Big Business Value1. Information is transparent and
usable at much higher frequency
2. Expose variability and boost performance
3. Narrow segmentation of customers and more precisely tailored products or services
4. Sophisticated analytics and improved decision-making
5. Improved development of the next generation of products and services
9
http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation?p=1
Copyright 2013 by Data Blueprint
Myth #2: Big Data has a clear definition
Fact:• The term is used so often and in
many contexts that its meaning has become vague and ambiguous
• Industry experts and scientists often disagree
10
http://articles.washingtonpost.com/2013-08-16/opinions/41416362_1_big-data-data-crunching-marketing-analytics
Copyright 2013 by Data Blueprint
Defining Big Data
11
• Gartner: High-volume, high-velocity, and/or high-variety information assets that require new forms of processing to enable enhanced decision-making, insight discovery and process optimization.
• IBM: Datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.
• NY Times: Shorthand for advancing trends in technology that open the door to a new approach to understanding the world and making decisions.
• McKinsey: Large pools of data that can be brought together and analyzed to discern patterns and make better decisions
1. VolumeThe amount of data
2. VelocityThe speed of data going in and out
3. VarietyThe range of data types & sources
4. VariabilityMany options or variable interpretations confound analysis
Q: "Would it be more useful to refer to "big data techniques?"
Copyright 2013 by Data Blueprint
Big Data Characteristics generally include:
12
Copyright 2013 by Data Blueprint
Big Data Gartner Hype Cycle
13
Copyright 2013 by Data Blueprint
Some Big Data Limitations
• Data analysis struggles with social cognition
• Data struggles with context• Data creates bigger haystacks• Big data has trouble with big
problems• Data favors memes over
masterpieces• Data obscures values
14
David Brooks, New York Times: http://www.nytimes.com/2013/02/19/opinion/brooks-what-data-cant-do.html?_r=0
Copyright 2013 by Data Blueprint
Business Information Market: $1.1 Trillion a Year
15
• Enterprises spend an average of $38 million on information/year
• Small and medium sized businesses on average spend $332,000
http://www.cio.com.au/article/429681/five_steps_how_better_manage_your_data
Copyright 2013 by Data Blueprint
Big Data = Big Spending• Enterprises are spending wildly on Big Data but don’t know if it’s
worth it yet (Business Insider, 2012)• Big Data Technology Spending Trend:
– 83% increase over the next 3 years (worldwide):• 2012: $28 billion• 2013: $34 billion • 2016: $232 billion
16
• Caution:– Don’t fall victim to SOS (Shiny Object
Syndrome)– A lot of money is being invested but is it
generating the expected return?– Gartner Hype Cycle suggests results are
going to be disappointinghttp://www.businessinsider.com/enterprise-big-data-spending-2012-11#ixzz2cdT8shhe
http://www.inc.com/kathleen-kim/big-data-spending-to-increase-for-it-industry.htmlhttp://www.gartner.com/DisplayDocument?id=2195915&ref=clientFriendlyUrl
Copyright 2013 by Data Blueprint
Myth #3: Big Data is just another IT project
Fact:• Big Data is not your typical IT
project– Does not answer typical IT questions– Trend analysis, agile, actionable, etc.– Fundamentally different approach
• Big Data Projects are exploratory• Big Data enables new capabilities• Big Data can be a disruptive
technology• It might sound simple but that
doesn’t mean it’s easy• Beware of SOS (Shiny Object
Syndrome)
17
Copyright 2013 by Data Blueprint
Healthcare Example: Patient Data
18
• Clinical data:– Diagnosis/prognosis/treatment– Genetic data
• Patient demographic data• Insurance data:
– Insurance provider– Claims data
• Prescriptions & pharmacy information• Physical fitness data
– Activity tracking through smartphone apps & social media
• Health history• Medical research data
Copyright 2013 by Data Blueprint
Retail Example: Loyalty Programs & Big Data
19
• Companies need to understand current wants and needs AND predict future tendencies
• Customer -> Repeat Customer -> Brand Advocate• Customer loyalty programs & retention strategies
– Track what is being purchased and how often– Coupons based on purchasing history– Targeted communications, campaigns & special offers– Social media for additional interactions– Personalize consumer interactions
• Customer purchase history influences product placements– Retailers rapidly respond to consumer demands– Product placements, planogram optimization, etc.
http://www.forbes.com/sites/xerox/2013/09/27/big-data-boosts-customer-loyalty-no-really/
Copyright 2013 by Data Blueprint
Take Aways-Big Data Context• Technology continues to evolve at
increasing speeds• Big Data is here
– We have the potential to create insights
• Spend wisely & strategically: – Big Data is not going to solve
all your problems.• Fact:
– Big Data is not for everyone• Fact:
– Lack of a clear definition• Hype Cycle:
– Current: Peak of Inflated Expectations– Soon: Trough of Disillusionment
20
Copyright 2013 by Data Blueprint
Outline
• Big Data Context: Why the Big Deal about Big Data?
• Big Data Challenges: Historical Perspective
• Big Data Challenges: Today• Big Data Approach: Crawl, Walk, Run• Design Principles:
Foundational & Technical• Take Aways and Q&A
21
Copyright 2013 by Data Blueprint
Myth #4: Big Data is new
Fact:• The term originated in the Silicon
Valley in the 1990s• The concept has been used
previously– 800 year old linguistic datasets– Use in sciences in 1600s– Kepler, Sloan Digital Sky Survey,
Statisticians’ view
• Much harder to leverage Big Data when you lack appropriate techniques
22
http://articles.washingtonpost.com/2013-08-16/opinions/41416362_1_big-data-data-crunching-marketing-analytics
Copyright 2013 by Data Blueprint 23
Bills of Mortality
“The Human Face of Big Data”, Rick Smolan & Jennifer Erwitt
Early Database
Copyright 2013 by Data Blueprint 24
Mortality Geocoding
Where is it happening?
When is it happening?
Why is it happening?
“The Human Face of Big Data”, Rick Smolan & Jennifer Erwitt
1.Volume– Plague data collection points
2.Velocity– Speed at which disease
registers are updated
3.Variety– Who is collecting plague data
points, how, and where?
4.Variability– Different ways of recording
disease patterns and using that data
– No social media yet but gossip existed
Copyright 2013 by Data Blueprint
Big Data Characteristics & the Plague
25
Copyright 2013 by Data Blueprint 26
John Snow’s 1854 Cholera Map of London
Copyright 2013 by Data Blueprint
Take Aways-Historic Big Data Challenges
• Fact: Big Data is not new• Foundational data
management challenges remain similar
• Bills of Mortality by John Graunt– First true health data set– World’s first pattern of
data– Foundation for probability
industry, statistics, insurance
27
Copyright 2013 by Data Blueprint
Outline
• Big Data Context: Why the Big Deal about Big Data?
• Big Data Challenges: Historical Perspective
• Big Data Challenges: Today• Big Data Approach: Crawl, Walk, Run• Design Principles:
Foundational & Technical• Take Aways and Q&A
28
Copyright 2013 by Data Blueprint
Myth #5: Big Data is innovative
Fact:• Big Data techniques are innovative• ROI and insights depend on the size
of the business and the amount of data used and produced, e.g.– Local pizza place vs. Papa John’s– Retail
29
Copyright 2013 by Data Blueprint
Data Footprints• SQL Server
– 47,000,000,000,000 bytes– Largest table 34 billion records
3.5 TBs
• Informix– 1,800,000,000 queries/day– 65,000,000 tables / 517,000
databases
• Teradata– 117 billion records– 23 TBs for one table
• DB2– 29,838,518,078 daily
queries
30
Copyright 2013 by Data Blueprint
Big Data Characteristics generally include:1. Volume
The amount of data2. Velocity
The speed of data going in and out
3. VarietyThe range of data types & sources
4. VariabilityMany options or variable interpretations confound analysis
31
Q: "Would it be more useful to refer to "big data techniques?"
Copyright 2013 by Data Blueprint 32
2012 London Summer Games• 60 GB of data/second• 200,000 hours of big data will
be generated testing systems• 2,000 hours media coverage/
daily• 845 million Facebook users
averaging 15 TB/day• 13,000 tweets/second• 4 billion watching• 8.5 billion devices connected
#1 VOLUME,The Amount of Data
Copyright 2013 by Data Blueprint
#2 VELOCITY, The Speed of Data
33
http://www.youtube.com/watch?v=LrWfXn_mvK8
Nanex 1/2 Second Trading Data
May 2, 2013
Johnson & Johnson
The European Union last year approved a new rule mandating that all trades must exist for at least a half-second - in this instance 1,200 orders and 215 actual trades
Copyright 2013 by Data Blueprint 34
#3 VARIETY, Range of Data Types & Sources Increasingly individuals make use of data producing gadgets to perform services for them
Copyright 2013 by Data Blueprint 35
#4 VARIABILITY, Many options or variable interpretations confound analysis
Historyflow-Wikipedia entry for the word “Islam”
Copyright 2013 by Data Blueprint
Take Aways: Big Data Challenges Today• Fact: Big Data techniques are innovative but
“Big Data” is not• Challenges are both foundational and
technical, today as well as in 1600s• Technology continues to advance rapidly (4
Vs)• Challenges associated with Big Data are not
new:– Well-known foundational data management issues– Need to align data and business with rapidly
changing environment– Duplicity, accessibility, availability– Foundational business issues
36
Copyright 2013 by Data Blueprint
Outline
• Big Data Context: Why the Big Deal about Big Data?
• Big Data Challenges: Historical Perspective
• Big Data Challenges: Today• Big Data Approach: Crawl, Walk, Run• Design Principles:
Foundational & Technical• Take Aways and Q&A
37
Copyright 2013 by Data Blueprint
Myth #6: Big Data provides all the Answers
Fact:• Big Data does not mean the end of
scientific theory• Be careful or you’ll end up with
spurious correlations– Don’t just go fishing for correlations and
hope they will explain the world
• To get to the WHY of things, you need ideas, hypotheses and theories
• Having more data does not substitute for thinking hard, recognizing anomalies and exploring deep truths
• You need the right approach
38
http://articles.washingtonpost.com/2013-08-16/opinions/41416362_1_big-data-data-crunching-marketing-analytics
Copyright 2013 by Data Blueprint 39
Copyright 2013 by Data Blueprint 40
• Identify business opportunity• How can data be leveraged in
exploring– External market place
• Analyze opportunities and threats– Internal efficiencies
• Analyze strengths and weaknesses
Copyright 2013 by Data Blueprint 41
Example: 2012 Olympic Summer Games1. Volume: 845 million FB users averaging 15 TB
+ of data/day2. Velocity: 60 GB of data per second 3. Variety: 8.5 billion devices connected4. Variability: Sponsor data, athlete data, etc.5. Vitality: Data Art project “Emoto”6. Virtual: Social media
Copyright 2013 by Data Blueprint 42
• Based on my 6 V analysis, do I need a Big Data solution or does my current BI solution address my business opportunity?– Do the 6 Vs indicate general Big Data characteristics?– What are the limitations of my current Bi environment?
(Technology constraint)– What are my budgetary restrictions? (Financial constraint)– What is my current Big Data knowledge base? (Knowledge
constraint)
Copyright 2013 by Data Blueprint 43
• MUST have both Foundational and Technical practice expertise
Copyright 2013 by Data Blueprint 44
Copyright 2013 by Data Blueprint 45
• Data Strategy
• Data Governance
• Data Architecture
• Data Education
Copyright 2013 by Data Blueprint 46
• Data Quality
• Data Integration
• Data Platforms
• BI/Analytics
Copyright 2013 by Data Blueprint 47
• Needs to be actionable• Generally well understood by
business• Document what has been learned
Copyright 2013 by Data Blueprint 48
• Perfect results are not necessary
• Reiterate and refine• Iterative process to
reach decision point• Use as feedback for
next exploration
Copyright 2013 by Data Blueprint 49
Copyright 2013 by Data Blueprint
Take Aways-Approach: Crawl, Walk, Run• Crawl:
– Identify business opportunity and determine whether you truly need a Big Data solution
• Walk:– Apply a combination of
foundational and technical data management practices. Document your insights and make sure they are actionable
• Run: – Recycle and explore. Staying
agile allows you to be exploratory.
50
Copyright 2013 by Data Blueprint
Outline
• Big Data Context: Why the Big Deal about Big Data?
• Big Data Challenges: Historical Perspective
• Big Data Challenges: Today• Big Data Approach: Crawl, Walk, Run• Design Principles:
Foundational & Technical• Take Aways and Q&A
51
Copyright 2013 by Data Blueprint
• Your data strategy must align to your organizational business strategy and operating model
• As the market place becomes more data-driven, a data-focused business strategy is an imperative
• Must have data strategy before you have a Big Data strategy
52
Foundational Practice: Data Strategy
Copyright 2013 by Data Blueprint
Data Strategy Case StudyEnterprise Information Management Maturity
53
Copyright 2013 by Data Blueprint
• What are the questions that you cannot answer today?
• Is there a direct reliance on understanding customer behavior to drive revenue?
• Do you have information overload and are you trying to find the signal in the noise?
• Which is more important:– Establishing value from current
data assets/data reporting?– Exploring Big Data
opportunities?
54
Data Strategy Considerations
Copyright 2013 by Data Blueprint
Myth #7: You need Big Data for Insights
Fact:• Distinction between Big Data and
doing analytics– Big Data is defined by the technology stack
that you use– Big Data is used for predictive and
prescriptive analytics
• Use existing data for reporting, figure out bottlenecks and optimize current business model
• Understand how is your data structured, architected and stored
55
Copyright 2013 by Data Blueprint
• Common vocabulary expressing integrated requirements ensuring that data assets are stored, arranged, managed, and used in systems in support of organizational strategy [Aiken 2010]
• Most organizations have data assets that are not supportive of strategies
• Big question:– How can organizations more
effectively use their information architectures to support strategy implementation?
56
Foundational Practice: Data Architecture
Copyright 2013 by Data Blueprint
• Does your current architecture for BI and analytics support Big Data?
• Are you getting enough value out of your current architecture?
• Can you easily integrate and share information across your organization?
• Do you struggle to extract the value from your data because it is too cumbersome to navigate and access?
• Are you confident your data is organized to meet the needs of your business?
57
Data Architecture Considerations
Copyright 2013 by Data Blueprint
• A data-centric organization requires unified data
• Integrating data across organizational silos creates new insights
• It is also the biggest challenge
• Big Data techniques can be used to complement existing integration efforts
58
Technical Practice: Data Integration
Allowing connections between RDBMS and NoSQL data is beneficial
Examples:1. Invoices2. Passports3. Stock shelving
Copyright 2013 by Data Blueprint
Integration Data Vault 2.0 with Big Data
59
Copyright 2013 by Data Blueprint
• The complexity of your data integration challenge depends on the questions you’re trying to answer
• Integration requirements for Big Data are dependent on the types of questions you’re asking: – Integration here may be more fuzzy than
discrete– Integration is domain-based (based on
time, customer concept, geographic distribution)
• Those requirements should evolve from your strategy
60
Data Integration Considerations
Copyright 2013 by Data Blueprint
• Quality is driven by fit for purpose considerations
• Big Data quality is different:– Basic– Availability– Soft-state– Eventual consistency
• Directional accuracy is the goal• Focus on your most important data
assets and ensure our solutions address the root cause of any quality issues – so that your data is correct when it is first created
• Experience has shown that organizations can never get in front of their data quality issues if they only use the ‘find-and-fix’ approach
61
Technical Practice: Data Quality
Copyright 2013 by Data Blueprint
• Big Data is trying to be predictive
• What are the questions you are trying to answer?– What level of accuracy are you
looking for?– What confidence levels?– Example: Do I need to know
exactly what the customer is going to buy or do I just need to know the range of products he/she is going to choose from?
62
Data Quality Considerations
Copyright 2013 by Data Blueprint
Myth #8: Bigger Data is Better
Fact:• Better to have less data of good
quality than more poor quality big data
• Analysis to reduce variables and increase manageability, otherwise Big Data = Quantity over Quality
• Beware of Shiny Object Syndrome– What problem are we trying to solve?– The solution needs to fit the problem
• Big Data may not be your answer, it may be your problem
• Investments in foundational and technical approaches result in better outcomes for Big Data
63
Copyright 2013 by Data Blueprint
• Do you want to measure critical operational process performance?
• No one data platform can answer all your questions. This is commonly misunderstood and often leads to very expensive, bloated and ineffective data platforms.
• Understanding the questions that need to be asked and how to build the right data platform or how to optimize an existing one
64
Technical Practice: Data Platforms
Copyright 2013 by Data Blueprint
The Big Data Landscape
65
Copyright Dave Feinleib, bigdatalandscape.com
Copyright 2013 by Data Blueprint
• Commonalities between most big data stacks with file storage, columnar store, querying engine, etc.
• Big data stack generally looks the same until you get into appliances – Algorithms are built into appliance
themselves, e.g. Netezza, Teradata, etc.)
• Ask these questions:– Do you want insights on your
customer’s behavior?– Do you need real-time customer
transactional information?– Do you need historical data or just
access to the latest transactions?– Where do you go to find the single
version of the truth about your customers?
66
Data Platforms Considerations
Copyright 2013 by Data Blueprint
Take Aways-Design Principles: Foundational & Technical
• Foundational data management principles still apply
• Beware of SOS (Shiny Object Syndrome)
• You must have a data strategy before you can have a Big Data strategy
• Fact: You don’t need Big Data to gain insights
• Big Data integration requirements evolve from your strategy
• Fact: Bigger Data is not always better
67
Copyright 2013 by Data Blueprint
Outline
• Big Data Context: Why the Big Deal about Big Data?
• Big Data Challenges: Historical Perspective
• Big Data Challenges: Today• Big Data Approach: Crawl, Walk, Run• Design Principles:
Foundational & Technical• Take Aways and Q&A
68
Copyright 2013 by Data Blueprint
Take Aways: In Summary• Big data techniques are innovative
but “Big Data” is not• Big Data characteristics: 6 Vs
– Volume, Velocity, Variety, Variability, Vitality, Virtual
• Approach: Crawl-Walk-Run• Big Data challenges require solutions
that are based on foundational and technical data management practices
• Beware of SOS (Shiny Object Syndrome):– Spend wisely and strategically– Big Data is not going to solve all your
problems
69
Copyright 2013 by Data Blueprint
References • The Human Face of Big Data, Rick Smolan & Jennifer Erwitt, First Edition edition (November
20, 2012)• McKinsey: Big Data: The next frontier for innovation, competition and productivity
(http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation?p=1)
• The Washington Post: Five Myths about Big Data (http://articles.washingtonpost.com/2013-08-16/opinions/41416362_1_big-data-data-crunching-marketing-analytics)
• Gartner: Gartner’s 2013 Hype Cycle for Emerging Technologies Maps Out Evolving Relationship Between Humans and Machines (http://www.gartner.com/newsroom/id/2575515)
• The New York Times | Opinion Pages: What Data Can’t Do (http://www.nytimes.com/2013/02/19/opinion/brooks-what-data-cant-do.html?_r=1&)
• CIO.com: Five Steps for How to Better Manage Your Data (http://www.cio.com.au/article/429681/five_steps_how_better_manage_your_data/)
• Business Insider: Enterprises Aren’t Spending Wildly on ‘Big Data’ But Don’t Know If It’s Worth It Yet (http://www.businessinsider.com/enterprise-big-data-spending-2012-11#ixzz2cdT8shhe)
• Inc.com: Big Data, Big Money: IT Industry to Increase Spending (http://www.inc.com/kathleen-kim/big-data-spending-to-increase-for-it-industry.html)
• Forbes: Big Data Boosts Customer Loyalty. No, Really. (http://www.forbes.com/sites/xerox/2013/09/27/big-data-boosts-customer-loyalty-no-really/)
70
Copyright 2013 by Data Blueprint
Questions?
It’s your turn! Use the chat feature or Twitter (#dataed) to submit
your questions to Peter now.
71
+ =
Data-Centric Strategy & Roadmap February 11, 2014 @ 2:00 PM ET/11:00 AM PT
Emerging Trends in Data JobsMarch 13, 2014 @ 2:00 PM ET/11:00 AM PT
Sign up here: www.datablueprint.com/webinar-schedule or www.dataversity.net
Copyright 2013 by Data Blueprint
Upcoming Events
72
10124 W. Broad Street, Suite CGlen Allen, Virginia 23060804.521.4056