What is big data - Architectures and Practical Use Cases

25
© 2013 IBM Corporation What is big data? Architectures and Practical use cases Tony Pearson – IBM Master Inventor and Senior Managing Consultant March 2013

description

Five ways to get started with big data using IBM's big data platform.

Transcript of What is big data - Architectures and Practical Use Cases

Page 1: What is big data - Architectures and Practical Use Cases

© 2013 IBM Corporation

What is big data?Architectures and Practical use casesTony Pearson – IBM Master Inventor and Senior Managing Consultant

March 2013

Page 2: What is big data - Architectures and Practical Use Cases

© 2013 IBM Corporation2

“At the World Economic Forum last month in Davos, Switzerland, Big Data was a marquee topic. A report by the forum, “big data, big impact,”declared data a new class of economic asset, like currency or gold.

“Companies are being inundated with data —from information on customer-buying habits to supply-chain efficiency. But many managers struggle to make sense of the numbers.”

“Increasingly, businesses are applying analytics to social media such as Facebook and Twitter, as well as to product review websites, to try to “understand where customers are, what makes them tick and what they want”, says Deepak Advani, who heads IBM’s predictive analytics group.”

“Big Data has arrived at Seton Health Care Family, fortunately accompanied by an analytics tool that will help deal with the complexity of more than two million patient contacts a year…”

“Data is the new oil.”Clive Humby

The Oscar Senti-meter — a tool developed by the L.A. Times, IBM and the USC Annenberg Innovation Lab — analyzes opinions about the Academy Awards race shared in millions of public messages on Twitter .”

“Data is the new Oil.”In its raw form, oil has little value. Once processed & refined, it helps power the world.

“…now Watson is being put to work digesting millions of pages of research,incorporating the best clinical practices and monitoring the outcomes to assist physicians in treating cancer patients.”

2

Page 3: What is big data - Architectures and Practical Use Cases

© 2013 IBM Corporation3

Big data is the analysis of information to identify trends, patterns and insights to make better business decisions

Collectively Analyzing the broadening Variety

Responding to the increasing Velocity

Cost efficiently processing the growing Volume

Establishing the Veracity of big data sources

30 Billion RFID sensors and counting

1 in 3 business leaders don’t trust the information they use to make decisions

50x 35 ZB

2020

80% of the worlds data is unstructured

2010

3

Page 4: What is big data - Architectures and Practical Use Cases

© 2013 IBM Corporation4

Where is all this data coming from?

Page 5: What is big data - Architectures and Practical Use Cases

© 2013 IBM Corporation5

Why is big data such a hot topic? Because technology makes it possible to analyze ALL available data

Cost effectively manage and analyze

all available data,

in its native form – unstructured, structured, streaming

ERPCRM RFID

Website

Network Switches

Social Media

Billing

5

Page 6: What is big data - Architectures and Practical Use Cases

© 2013 IBM Corporation6

Imagine the Possibilities of Harnessing your Data Resources

Retailer reduces time to run queries by 80% to

optimize inventory

Stock Exchange cuts queries from 26 hours to2 minutes on 2 PB

Government cuts acoustic analysis from hours to 70

Milliseconds

Utility avoids power failures by analyzing 10 PB of data in minutes

Telco analyses streaming network data to reduce

hardware costs by 90%

Hospital analyzes streaming vitals to intervene 24

hours earlier

Big data challenges exist in every business today

6

Page 7: What is big data - Architectures and Practical Use Cases

© 2013 IBM Corporation77

Impact every aspect of your business

7

Run Zero-latency OperationsAnalyze all available operational data and react in real-time to optimize processes.

Reduce the cost of IT with new cost-effective technologies.

Know Everything about your CustomerAnalyze all sources of data to know your customers as individuals, from channel interactions to social media.

Instant Awareness of Fraud and RiskDevelop better fraud/risk models by analyzing all available data, and detect fraud in real-time with streaming transaction analysis.

Innovate New Products at Speed and ScaleCapture all sources of feedback and analyze vast amounts of market and research data to drive innovation.

Exploit Instrumented AssetsMonitor assets from real-time data feeds to predict and prevent maintenance issues and develop new products & services.

Page 8: What is big data - Architectures and Practical Use Cases

© 2013 IBM Corporation8

In order to realize new opportunities, you need to think beyond traditional sources of data

Social Data

• Variety

• Highly unstructured

• Veracity

Machine Data

• Velocity

• Semi-structured

• Ingestion

Transactional & Application Data

• Volume

• Structured

• Throughput

Enterprise Content

• Variety

• Highly unstructured

• Volume

8

Page 9: What is big data - Architectures and Practical Use Cases

© 2013 IBM Corporation9

Leveraging big data requires multiple platform capabilities

Manage & store huge volume of any data

Hadoop File System

MapReduce

Manage Streaming Data Stream Computing

Analyze Unstructured Data Text Analytics Engine

Data WarehousingStructure and control data

Integrate and govern all data sources

Integration, Data Quality, Security, Lifecycle Management, MDM

Understand and navigate federated big data sources

Federated Discovery and Navigation

Page 10: What is big data - Architectures and Practical Use Cases

© 2013 IBM Corporation10

Business-centric big data enables you to start with most critical business pain and expand the foundation for future requirements

� Not just new technology—big data is a business strategy for capitalizing on information resources

� Getting started is crucial

� Success at each entry point is accelerated by products within the big data platform

� Build the foundation for future requirements by expanding further into the big data platform

IBMbig dataplatform

PureD

ata

System

s

Info

sphe

r e O

pti m

Page 11: What is big data - Architectures and Practical Use Cases

© 2013 IBM Corporation11

1 – Federated Discovery and Navigation

• Customer Need– Understand existing data sources– Expose the data within existing

content management and file systems for new uses, without copying the data to a central location

– Search and navigate big data from federated sources

• Value Statement– Get up and running quickly and

discover and retrieve relevant big data – Use big data sources in new

information-centric applications

• Customer examples– Proctor and Gamble – Connect

employees with a 360° view of big data sources

• Get started with: IBM Vivisimo Velocity

11

Page 12: What is big data - Architectures and Practical Use Cases

© 2013 IBM Corporation12

2 – Analyze raw data

• Customer Need– Ingest data as-is into Hadoop and derive insight from it– Process large volumes of diverse data within Hadoop– Combine insights with the data warehouse– Low-cost ad-hoc analysis with Hadoop to test new

hypothesis

• Value Statement– Gain new insights from a variety and combination of

data sources – Overcome the prohibitively high cost of converting

unstructured data sources to a structured format – Extend the value of the data warehouse by bringing in

new types of data and driving new types of analysis– Experiment with analysis of different data combinations

to modify the analytic models in the data warehouse

• Customer examples– Financial Services Regulatory Org – managed

additional data types and integrated with their existing data warehouse

• Get started with: InfoSphere BigInsights

12

Page 13: What is big data - Architectures and Practical Use Cases

© 2013 IBM Corporation13

3 – Analyze streaming data

• Customer Need– Harness and process streaming data sources – Select valuable data and insights to be stored

for further processing– Quickly process and analyze perishable data,

and take timely action

• Value Statement– Significantly reduced processing time and

cost – process and then store what’s valuable

– React in real-time to capture opportunities before they expire

• Customer examples– Ufone – Telco Call Detail Record (CDR)

analytics for customer churn prevention

• Get started with: InfoSphere Streams

Streams ComputingStreaming Data

Sources

ACTION

13

Page 14: What is big data - Architectures and Practical Use Cases

© 2013 IBM Corporation14

4 – Simplify your data warehouse

• Customer Need– Business users are hampered by the poor

performance of analytics of a general-purpose enterprise warehouse – queries take hours to run

– Enterprise data warehouse is encumbered by too much data for too many purposes

– Need to ingest huge volumes of structured data and run multiple concurrent deep analytic queries against it

– IT needs to reduce the cost of maintaining the data warehouse

• Value Statement– Speed and Simplicity for deep analytics– 100s to 1000s users/second for operation analytics

• Customer examples– Catalina Marketing – executing 10x the amount of

predictive workloads with the same staff

• Get started with: IBM PureData Systems

14

Page 15: What is big data - Architectures and Practical Use Cases

© 2013 IBM Corporation15

5 – Reduce costs with Archive and Storage Tiering

• Customer Need– Reduce the overall cost to maintain data in the

warehouse – often its seldom used and kept ‘just in case’

– Lower costs as data grows within the data warehouse – Reduce expensive infrastructure used for processing

and transformations

• Value Statement– Support existing and new workloads on the most cost

effective alternative, while preserving existing access and queries

– Lower storage costs– Reduce processing costs by pushing processing onto

commodity hardware and parallel processing

• Customer examples– Financial Services Firm – archive production data to

archive files on less expensive disk and tape storage, while maintaining access to data

• Get started with: IBM Infosphere Optim, Infosphere Information Server, Master Data Management (MDM)

15

Page 16: What is big data - Architectures and Practical Use Cases

© 2013 IBM Corporation16

Entry points are accelerated by products within the big data platform

BI / Reporting

BI / Reporting

Exploration / Visualization

FunctionalApp

IndustryApp

Predictive Analytics

Content Analytics

Analytic Applications

IBM big data platform

Systems Management

Application Development

Visualization & Discovery

Accelerators

Information Integration & Governance

HadoopSystem

Stream Computing

Data Warehouse

2 – Analyze raw data with Hadoop

InfoSphereBigInsights

3 – Analyze streaming data

InfoSphere Streams

1 – Federated Discovery and Navigation

IBM Vivisimo4 – Simplify your data warehouse

IBM Warehouse Solutions and PureData systems

5 – Reduce costs through archive and storage tiering

InfoSphere OptimInformation ServerMDM

16

Page 17: What is big data - Architectures and Practical Use Cases

© 2013 IBM Corporation17

There are many use cases for a big data platform

Innovate new Products Speed and Scale

Know Everything About your Customers

• Social Media - Product/brand Sentiment analysis

• Brand strategy• Market analysis• RFID tracking & analysis• Transaction analysis to create insight-

based product/service offerings

• Social media customer sentiment analysis• Promotion optimization• Segmentation• Customer profitability • Click-stream analysis• CDR processing• Multi-channel interaction analysis• Loyalty program analytics• Churn prediction

Run Zero Latency Operations• Smart Grid/meter management• Distribution load forecasting• Sales reporting• Inventory & merchandising optimization• Options trading• ICU patient monitoring• Disease surveillance• Transportation network optimization• Store performance• Environmental analysis• Experimental research

Instant Awareness of Risk and Fraud

• Multimodal surveillance• Cyber security• Fraud modeling & detection• Risk modeling & management• Regulatory reporting

Exploit Instrumented Assets• Network analytics• Asset management and predictive issue resolution• Website analytics• IT log analysis

17

Page 18: What is big data - Architectures and Practical Use Cases

© 2013 IBM Corporation18

Achieve Breakthrough Outcomes with big data capabilities

With Unique CapabilitiesWith Unique Capabilities

EnterpriseContent

Transactional / Application Data

Machine Data

Social Media Data

Visualization & Discovery

Hadoop

Data warehousing

Stream Computing

Text Analytics

Integration & governance

18

Analyze any big data TypeAnalyze any

big data TypeTo Achieve Breakthrough

OutcomesTo Achieve Breakthrough

Outcomes

Know Everything about your Customers

Run Zero-latency Operations

Innovate new products at Speed and Scale

Instant Awareness of Fraud and Risk

Exploit Instrumented Assets

Page 19: What is big data - Architectures and Practical Use Cases

© 2013 IBM Corporation19

Example Configuration for big data

IBM PureDataSystem for Analytics

Page 20: What is big data - Architectures and Practical Use Cases

© 2013 IBM Corporation20

The IBM big data platform advantage

BI / Reporting

BI / Reporting

Exploration / Visualization

FunctionalApp

IndustryApp

Predictive Analytics

Content Analytics

Analytic Applications

IBM big data platform

Systems Management

Application Development

Visualization & Discovery

Accelerators

Information Integration & Governance

HadoopSystem

Stream Computing

Data Warehouse

• The platform provides benefit as you move from an entry point to a second and third project

• Shared components and integration between systems lowers deployment costs

• Key points of leverage• Reuse text analytics across streams

and Hadoop

• HDFS connectors between Streams and Information Integration

• Common integration, metadata and governance across all engines

• Accelerators built across multiple engines – common analytics, models, and visualization

20

Page 21: What is big data - Architectures and Practical Use Cases

© 2013 IBM Corporation21

Page 22: What is big data - Architectures and Practical Use Cases

© 2013 IBM Corporation22

IBM Redbooks Available!

� IBM Redbooks available on deployment of big data using IBM System x and InfoSphere software

� Reference Architecture for different sizes

Page 23: What is big data - Architectures and Practical Use Cases

© 2013 IBM Corporation23

About the Speaker

Mr. Tony Pearson

Master Inventor,

Senior Managing Consultant

IBM System Storage

Tony Pearson is a Master Inventor and Senior managing consultant for the IBM System Storage™ product line. Tony joined

IBM Corporation in 1986 in Tucson, Arizona, USA, and has lived there ever since. In his current role, Tony presents briefings on storage topics covering the entire System Storage product line, Tivoli storage software products, and topics related to Cloud

Computing. He interacts with clients, speaks at conferences and events, and leads client workshops to help clients with strategic planning for IBM’s integrated set of storage management software, hardware, and virtualization products.

Tony writes the “Inside System Storage” blog, which is read by hundreds of clients, IBM sales reps and IBM Business Partners every week. This blog was rated one of the top 10 blogs for the IT storage industry by “Networking World” magazine, and #1

most read IBM blog on IBM’s developerWorks. The blog has been published in series of books, Inside System Storage: Volume I through V.

Over the past years, Tony has worked in development, marketing and customer care positions for various storage hardware and software products. Tony has a Bachelor of Science degree in Software Engineering, and a Master of Science degree in

Electrical Engineering, both from the University of Arizona. Tony holds 19 IBM patents for inventions on storage hardware and software products.

9000 S. Rita RoadBldg 9032 Room 1238Tucson, AZ 85744

+1 520-799-4309 (Office)

[email protected]

Tony Pearson

Master Inventor, Senior Managing Consultant

IBM System Storage™

Page 24: What is big data - Architectures and Practical Use Cases

© 2013 IBM Corporation24

Additional Resources

24

Email:[email protected]

Twitter:http://twitter.com/az99Øtony

Blog: http://ibm.co/brAeZØ

Books:http://www.lulu.com/spotlight/99Ø_tony

IBM Expert Network:http://www.slideshare.net/az99Øtony

24

Page 25: What is big data - Architectures and Practical Use Cases

© 2013 IBM Corporation25

Trademarks and disclaimersAdobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency which is now part of the Office of Government Commerce. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office. UNIX is a registered trademark of The Open Group in the United States and other countries. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom. Linear Tape-Open, LTO, the LTO Logo, Ultrium, and the Ultrium logo are trademarks of HP, IBM Corp. and Quantum in the U.S. and other countries.

Other product and service names might be trademarks of IBM or other companies. Information is provided "AS IS" without warranty of any kind.

The customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer.

Information concerning non-IBM products was obtained from a supplier of these products, published announcement material, or other publicly available sources and does not constitute an endorsement of such products by IBM. Sources for non-IBM list prices and performance numbers are taken from publicly available information, including vendor announcements and vendor worldwide homepages. IBM has not tested these products and cannot confirm the accuracy of performance, capability, or any other claims related to non-IBM products. Questions on the capability of non-IBM products should be addressed to the supplier of those products.

All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.

Some information addresses anticipated future capabilities. Such information is not intended as a definitive statement of a commitment to specific levels of performance, function or delivery schedules with respect to any future products. Such commitments are only made in IBM product announcements. The information is presented here to communicate IBM's current investment and development activities as a good faith effort to help with our customers' future planning.

Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput or performance improvements equivalent to the ratios stated here.

Prices are suggested U.S. list prices and are subject to change without notice. Starting price may not include a hard drive, operating system or other features. Contact your IBM representative or Business Partner for the most current pricing in your geography.

Photographs shown may be engineering prototypes. Changes may be incorporated in production models.

© IBM Corporation 2013. All rights reserved.

References in this document to IBM products or services do not imply that IBM intends to make them available in every country.

Trademarks of International Business Machines Corporation in the United States, other countries, or both can be found on the World Wide Web at http://www.ibm.com/legal/copytrade.shtml. ZSP03490-USEN-00