Big Data in Action – Real-World Solution Showcase
-
Upload
inside-analysis -
Category
Technology
-
view
787 -
download
1
description
Transcript of Big Data in Action – Real-World Solution Showcase
Grab some coffee and enjoy the pre-show banter before the top of the hour!
The Briefing Room
Big Data in Action: Real-World Solution Showcase
Twitter Tag: #briefr
The Briefing Room
! Reveal the essential characteristics of enterprise software, good and bad
! Provide a forum for detailed analysis of today’s innovative technologies
! Give vendors a chance to explain their product to savvy analysts
! Allow audience members to pose serious questions... and get answers!
Mission
Twitter Tag: #briefr
The Briefing Room
Topics
This Month: BIG DATA
March: CLOUD
April: BIG DATA
2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room
Twitter Tag: #briefr
The Briefing Room
Big Data
Twitter Tag: #briefr
The Briefing Room
Analysts: Lindy Ryan and John O’Brien
Lindy Ryan is the Research Director for Radiant Advisor’s Data Discovery and Visualization practice and leads research and analyst activities in the confluence of data discovery, visualization, and data science from a business needs perspective. She also retains the role of Editor in Chief of RediscoveringBI Magazine. As Radiant Advisors’ Editor in Chief for three years, Lindy participated in in-depth discussions and analysis with industry thought leaders and vendors while maturing her position and perspectives in the BI industry.
John O’Brien is Principal and CEO of Radiant Advisors. With over 25 years of experience delivering value through data warehousing and BI programs, John’s unique perspective comes from the combination of his roles as a practitioner, consultant, and vendor in the BI industry. His knowledge in designing, building, and growing enterprise BI systems and teams brings real world insights to each role and phase within a BI program. Today, through Radiant Advisors John provides research and advisory services that guide companies in meeting the demands of next generation information management, architecture, and emerging technologies.
Twitter Tag: #briefr
The Briefing Room
! IBM offers a full suite of Big Data solutions, including InfoSphere Streams, InfoSphere BigInsights and InfoSphere Data Explorer
! IBM also offers a series of products designed to leverage the power of Hadoop
! Stream Integration is a Premier Business Partner with IBM and focuses its consultancy exclusively on IBM products
IBM
Twitter Tag: #briefr
The Briefing Room
Guests:
Eric Poulin VP of Business Analytics, Stream Integration
Paul Flach VP of Enterprise Analytics, Stream Integration
11 11
Agenda
• Overview of Stream Integra3on • Big Data Performance for Analy3cs • Modular Analy3cs
12
Company Overview
Copyright © 2014, Stream Integra3on Inc. All rights reserved.
• Award Winning Information Lifecycle Consultancy
• Founded in 2000 • IBM Premier Partner • Exclusively focused on IBM
Information Management, Big Data and Analytics
• Offices in North America, Caribbean, and Europe
• Development and Support Centers in India and China
12
13
LINKING DATA TO THE BUSINESS REQUIREMENTS
CONTENT
STRUCTURED DATA
ANALYZE INTEGRATE
GOVERN
INFOSPHERE MDM
DATA
TRANSACTIONAL & COLLABORATIVE APPLICATIONS
MANAGE
BUSINESS ANALYTICS APPLICATIONS
STREAMS
BIG DATA
EXTERNAL INFORMATION SOURCES
ww
QUALITY
LIFECYCLE MANAGEMENT
SECURITY & PRIVACY
INFORMATION
SERVER
DESIGN ★ DEPLOY ★ OPERATE ★ MANAGE ★ EXTEND
BIG INSIGHTS
TRADITIONAL SOURCES
PUREDATA/NETEZZA
STREAMING INFORMATION
14
Performance for the Future of Analy3cs
Paul Flach Stream Integra3on
15
Capabili3es Required for Hadoop Style Workloads
Run3me
Cluster and Workload Management
Visualiza3on & Discovery
Data Ingest
Analy3cs Engines
File System
Data Store
Applica3on Support and Development Tooling
Security
15
16
Big SQL provides na3ve SQL for Hadoop
ANSI SQL 92+ support
17
Map Reduce MPP RunKme n+2
User Data temp(s)
HDFS
Hadoop Data Node(s)
Map Reduce MPP RunKme n+n
User Data temp(s)
HDFS
SQL sub-sections
Head Node
Host 2 Host n
Catalog Coordinator node
Host 1
Cluster network
Local fs (temps)
Local fs (catalog tables)
Distributed fs
sync
Map Reduce MPP RunKme n+1
User Data temp(s)
HDFS
Direct Hadoop data access sync
sync
Big AcceleraKon
Query OpKmizer
Common SQL BigInsights – DB2 – Netezza
Oracle – Teradata
Next Gen Big SQL will provide first MPP query engine for Hadoop
18
BigSheets provides business users with access to data without programming
Spreadsheet-‐style interface
Data VisualizaKon & Graphs
19
Watson Explorer included in BigInsights
Faceted Search,
NavigaKon & Discovery
20
AnalyKcs Accelerators provide ability to extract insights more quickly
Text Social Media
Machine Data
21
App Store reduces development effort and enables reusability
Combine Hadoop Apps
22
Open Source Hadoop Components
Open Source
Visualization & Discovery Data Ingest
Cluster Optimization and Management
Nutch
Runtime
Analytics Engines
File System
MapReduce
HDFS
Data Store HBase
Application Support and Development Tooling
MapReduce Pig Hive
ZooKeeper
Sqoop
Security
HCatalog
Flume
Avro
Lucene Oozie
Derby
22
23
BigInsights Enterprise Edi3on Components
IBM Open Source
Visualization & Discovery Data Ingest
Cluster Optimization and Management Streams
Netezza
Nutch
DB2
DataStage
IBM InfoSphere BigInsights
Runtime
Analytics Engines
File System
MapReduce
HDFS
Data Store HBase
Text Processing Engine and Extractor Library (AQL+HIL)
JDBC
Application Support and Development Tooling
App infrastructure
MapReduce Pig Hive
Splicable Text Compression
ZooKeeper High Availability
Integrated Installer Admin Console
Sqoop
SystemML
Eclipse Big SQL
Security
HCatalog
R
Gnip
BoardReader
GPFS-‐FPO Guardium
Flume
Jaql
Avro
BigSheets
Dashboard / visualiza3on Data Explorer Lucene Oozie
PAM
LDAP
Private firewall
Derby
Adap3ve MapReduce Enhanced Monitoring
Teradata
23
24
Modular Analy3cs
25 25
Plagorm Analy3c Modules
Cloud Computing
GIS Engine Forecasting Engine
Routing Engine
Work Force Engine
Inventory Engine
Solutions
Core Engine
IMDB
Column-‐Store
BigInsights
Streams
PureData
In-‐Flight Data
Self-‐ Structured Data
Frequently Requested Summaries
Low Entropy Data
Mixed Workload Requests
26
Thank you!
Twitter Tag: #briefr
The Briefing Room
Perceptions & Questions
Analysts: Lindy Ryan and John O’Brien
© Copyright 2014 Radiant Advisors. All Rights Reserved v1.10.000
BIG DATA IN ACTION
28
Real-World Solution Showcase with Stream Integration Inside Analysis – The Briefing Room, February 25, 2014
Lindy Ryan | Research Director, Data Discovery & Visualization @lindy_ryan [email protected]
John O’Brien | Principal Analyst, Modern Data Platforms @obrienjw [email protected]
© Copyright 2014 Radiant Advisors. All Rights Reserved v1.10.000
Flexibility Class
MODERN DATA PLATFORM Big Data in Action: Real-World Solutions
29
Enterprise Data
Warehouses
ç
ç Master
Reference Data
Discovery, Scalable, Programs Stable, Context, SQL Discovery & Analytics Oriented
Apache Hadoop
ç
Highly Optimized for Analytics
In-memory MOLAP MPP
Optimized Class Reference Class
R pr
ogra
ms
Hiv
e SQ
L
askdjfl kasjdfl iuyuiio
Highly Specialized for Analytics
Graphs Document
Stores Text
Analytics
P
IG /
Hiv
e
Map
Red
uce
Ope
ratio
nal S
yste
ms,
Big
Dat
a, S
tream
s
HD
FS
ç Columnar
Extending SQL Access to Big Data and Hadoop via Hive and other HDFS SQL engines
© Copyright 2014 Radiant Advisors. All Rights Reserved v1.10.000
SQL-ON-HADOOP Big Data in Action: Real-World Solutions
30
PIG
Hiv
e-Q
L
MapReduce
HCatalog
Hadoop HDFS
Apache Hadoop v1
Map
Re
duce
PIG
Hiv
e-Q
L YARN
HCatalog
Hadoop HDFS
Apache Hadoop v2
PIG
H
ive
Map
Re
duce
YARN
Hadoop HDFS
HCatalog
Impa
la, H
AWQ
In
finiD
B, P
rest
o
Hadoop Distributions and 3rd Party
MPP
Eng
ine
Not all SQL-on-Hadoop is the same: 1. SQL capabilities (SQL-92, Analytic functions SQL-2003? SQL-2011? UDF?) 2. Scalability (not always the same as Hadoop scalability) 3. Speed (flat out performance response time without caching)
File types: ORCFILE, SEQPART, Parquet
© Copyright 2014 Radiant Advisors. All Rights Reserved v1.10.000
TRADITIONAL FORMS OF DISCOVERY Big Data in Action: Real-World Solutions
31
Spreadsheets • Most popular business “analytic” tool • Having access to the data is the value • Analysts can slice and dice data for insights
Basic Visualizations • Provide visual representations of data • Provide insights beyond plain text data • Simplify complex information & highlight trends
© Copyright 2014 Radiant Advisors. All Rights Reserved v1.10.000
ANALYTIC FORMS OF DISCOVERY Big Data in Action: Real-World Solutions
32
Multi-Faceted, “Search Mode” • Discovery within structured & unstructured data • Mine through various forms of data at once • Google-like search to iterate and deep dive
Advanced Visualizations • Visualize clusters of data and correlations • Discover analytic models iteratively with data • Visual cues and cognitive sciences UX
© Copyright 2014 Radiant Advisors. All Rights Reserved v1.10.000
THANK YOU!
For more information
www.RadiantAdvisors.com
Twitter: @RadiantAdvisors #ModernBI #RediscoveringBI
RSS: feed://radiantadvisors.com/feed/
Email: [email protected]
LinkedIn: www.linkedin.com/company/radiant-advisors
Subscribe: Rediscovering BI quarterly e-magazine
www.radiantadvisors.com/rediscoveringbi
33
© Copyright 2014 Radiant Advisors. All Rights Reserved v1.10.000
ANALYST QUESTIONS Big Data in Action: Real-World Solutions
34
1. How are you handling the performance or SQL capabilities in Hive with Big SQL?
2. How do users define schema for Big SQL?
3. Can you explain user roles, security, and metadata in the App Store? Who is the store administrator?
Twitter Tag: #briefr
The Briefing Room
Twitter Tag: #briefr
The Briefing Room
Upcoming Topics
www.insideanalysis.com
2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room
This Month: BIG DATA
March: CLOUD
April: BIG DATA
Twitter Tag: #briefr
The Briefing Room
THANK YOU for your
ATTENTION!