Left Brain, Right Brain: How to Unify Enterprise Analytics
-
Upload
inside-analysis -
Category
Technology
-
view
1.385 -
download
0
description
Transcript of Left Brain, Right Brain: How to Unify Enterprise Analytics
The Briefing Room
Twitter Tag: #briefr
The Briefing Room
! Reveal the essential characteristics of enterprise software, good and bad
! Provide a forum for detailed analysis of today’s innovative technologies
! Give vendors a chance to explain their product to savvy analysts
! Allow audience members to pose serious questions... and get answers!
Mission
Twitter Tag: #briefr
The Briefing Room
JANUARY: Big Data
February: Analytics
March: Open Source
April: Intelligence
Twitter Tag: #briefr
The Briefing Room
Big Data
Copy
righ
ted
prop
erty
. M
ay n
ot b
e co
pied
or
dow
nloa
ded
wit
hout
per
mis
sion
fro
m 1
23RF
Lim
ited
.
NEW SOURCESNew Insights NEW Challenges
Twitter Tag: #briefr
The Briefing Room
Analyst: Robin Bloor
Robin Bloor is Chief Analyst at The Bloor Group
Twitter Tag: #briefr
The Briefing Room
! Teradata is known for its data analytics solutions with a focus on integrated data warehousing, big data analytics and business applications
! It offers a broad suite of technology platforms and solutions; data management applications; and data mining capabilities
! Teradata Aster is its MapReduce platform to handle big data analytics on multi-structured data
Teradata Aster
Twitter Tag: #briefr
The Briefing Room
Steve Wooledge
Steve is Senior Director of Product Marketing for
Teradata Aster and has 10 years of industry
experience.
Steve Wooledge – Sr. Director, Product Marketing, Teradata Aster January 2013
Bringing Big Data into the Light: Teradata Big Analytics Appliance
Confidential and proprietary. Copyright © 2012 Teradata Corporation. Confidential and proprietary. Copyright © 2012 Teradata Corporation. 10
TOPICS
WHAT IS DIFFERENT ABOUT BIG DATA ANALYTICS?
MAKING BIG ANALYTICS & DISCOVERY FAST AND EASY
TERADATA ASTER BIG ANALYTICS APPLIANCE
Confidential and proprietary. Copyright © 2012 Teradata Corporation.
What is Different about Big Analytics and Discovery?
Confidential and proprietary. Copyright © 2013 Teradata Corporation. 12
The Lytro and Big Data
Confidential and proprietary. Copyright © 2013 Teradata Corporation. 13
“Interactive, Living Pictures”
Confidential and proprietary. Copyright © 2013 Teradata Corporation. 14
See Your Business in High-Definition Big Analytics & Discovery Unlocks Hidden Value
“Capture only what’s needed”
IT delivers a platform for storing, refining, and
analyzing all data sources Business explores data for questions worth answering
Big Data Analytics Multi-structured & Iterative Analysis
IT structures the data to answer those questions
Business determines what questions to ask
Classic BI Structured & Repeatable Analysis
“Capture in case it’s needed”
Confidential and proprietary. Copyright © 2013 Teradata Corporation. 15
Iterative Analytics Accelerates Discovery
Analytical Idea
Evaluate Results SQL and non-SQL
Analysis
Operational DB or EDW
Operationalize or Move On
Zero-ETL Data Load/Integration 5x
Faster Discovery Process
with Aster - Hours vs. Days
Confidential and proprietary. Copyright © 2013 Teradata Corporation. 16
Need for a Unified Data Architecture for New Insights Enabling Any User for Any Data Type from Data Capture to Analysis
Java, C/C++, Python, R, SAS, SQL, Excel, BI, Visualization
Discover and Explore Reporting and Execution in the Enterprise
Capture, Store and Refine
Audio/ Video Images Docs Text Web &
Social Machine
Logs CRM SCM ERP
Confidential and proprietary. Copyright © 2013 Teradata Corporation. 17
Big Data Comes with BIG HEADACHES
Even free software like Hadoop is causing companies to spend more money…Many CIOs believe data is inexpensive because storage has become inexpensive. But data is inherently messy—it can be wrong, it can be duplicative, and it can be irrelevant—which means it requires handling, which is where the real expenses come in.
“
” Through 2015, 85% of Fortune 500 organizations will
be unable to exploit big data for competitive advantage. “ ” Source: The Wall Street Journal. “CIOs’ Big Problem with Big Data”. Aug 2012 Source: Gartner. “Information Innovation: Innovation Key Initiative Overview”. April 2012
Confidential and proprietary. Copyright © 2013 Teradata Corporation. 18
AUDIO & VIDEO IMAGES TEXT WEB & SOCIAL MACHINE LOGS CRM SCM ERP
DISCOVERY PLATFORM
CAPTURE | STORE | REFINE
INTEGRATED DATA WAREHOUSE
UNIFIED DATA ARCHITECTURE
Big Data Analytics
Big Data Management
LANGUAGES MATH & STATS DATA MINING BUSINESS INTELLIGENCE APPLICATIONS
Engineers
Data Scientists
Business Analysts
Front-Line Workers Customers / Partners Quants
Operational Systems Executives
Confidential and proprietary. Copyright © 2013 Teradata Corporation. 19
AUDIO & VIDEO IMAGES TEXT WEB & SOCIAL MACHINE LOGS CRM SCM ERP
DISCOVERY PLATFORM
CAPTURE | STORE | REFINE
INTEGRATED DATA WAREHOUSE
TERADATA UNIFIED DATA ARCHITECTURE
LANGUAGES MATH & STATS DATA MINING BUSINESS INTELLIGENCE APPLICATIONS
Engineers
Data Scientists
Business Analysts
Front-Line Workers Customers / Partners Quants
Operational Systems Executives
Confidential and proprietary. Copyright © 2013 Teradata Corporation. 20 AUDIO & VIDEO IMAGES TEXT WEB & SOCIAL MACHINE LOGS CRM SCM ERP
DISCOVERY PLATFORM
CAPTURE | STORE | REFINE
INTEGRATED DATA WAREHOUSE
LANGUAGES MATH & STATS DATA MINING BUSINESS INTELLIGENCE APPLICATIONS VIEWPOINT SUPPORT
Engineers
Data Scientists
Business Analysts
Front-Line Workers Customers / Partners Quants
Operational Systems Executives
TERADATA UNIFIED DATA ARCHITECTURE
Aster Connector for Hadoop
Teradata Connector for Hadoop
Aster Teradata Connector
SQL-H
Aster Loader Teradata Loader
SQL-H
Confidential and proprietary. Copyright © 2013 Teradata Corporation. 21
Shift from a Single Platform to an Ecosystem
Source: “Big Data Comes of Age”. EMA and 9sight Consulting. Nov 2012.
“Big Data requirements are solved by a range of platforms including analytical databases, discovery platforms and NoSQL solutions beyond Hadoop.”
Confidential and proprietary. Copyright © 2012 Teradata Corporation.
How Does Big Analytics and Discovery Add Business Value?
Confidential and proprietary. Copyright © 2013 Teradata Corporation. 23
Customer Behavior Analysis
BI Tools Database Tools Monitoring Tools
STORE VISION PLATFORM
DATA
CALL CENTER DATA
EMAIL CORRESPOND-
ENCE DATA
BRANCH TELLER DATA
ONLINE BANKING
DATA
CUSTOMER PROFILE DATA
CUSTOMER SURVEY DATA
Confidential and proprietary. Copyright © 2013 Teradata Corporation. 24
Events Preceding Account Closure
Confidential and proprietary. Copyright © 2013 Teradata Corporation. 25
Interactive Analytics Reducing the “Noise” to find the “Signal”
SELECT * FROM npath ( ON ( SELECT … WHERE u.event_description IN ( SELECT aper.event FROM attrition_paths_event_rank aper ORDER BY aper.count DESC LIMIT 10) ) … PATTERN ('(OTHER|EVENT){1,20}$') SYMBOLS (…) RESULT (…) ) ) n;
Events Preceding Account Closure
Confidential and proprietary. Copyright © 2012 Teradata Corporation.
How Do We Make Big Analytics & Discovery Possible?
Confidential and proprietary. Copyright © 2013 Teradata Corporation. 27
Key Requirements of a Discovery Platform
Highly Efficient & Performant Big Data Platform That Allows Quick Iterations 1
Hybrid Capabilities that supports SQL, statistics, and new MapReduce analytics 2
Significant Out-of-the-Box Analytical Functions that Minimize Development 3
Democratize Big Data & Maximize Enterprise Adoption
Confidential and proprietary. Copyright © 2013 Teradata Corporation. 28
Teradata Aster Big Analytics Appliance First Deeply Integrated SQL, MapReduce and Hadoop Appliance
UNIQUE FEATURES
1. Integrated, modular Aster Database and 100% Open-Source Hortonworks HDP
2. First and only ANSI SQL & HCatalog integration via SQL-H™ 3. Industry’s only ANSI-standard SQL & MapReduce integration
via SQL-MapReduce® 4. Industry’s most manageable & supportable Apache Hadoop
appliance via Teradata Viewpoint™ & TVI™ 5. Most complete MapReduce App Portfolio with 70+ pre-built
MapReduce functions 6. Fully engineered and supported by Teradata, with Level-4
support by Hortonworks world-class Hadoop team
Benefits • Leverage existing investments in standard BI, ETL tools & people with SQL skills • Industry’s highest performance platform for Big Analytics • Lowest TCO (technology + people), highest ROI, and fastest time to value
Confidential and proprietary. Copyright © 2013 Teradata Corporation. 29
Teradata Aster Analytics Portfolio The App Store of Big Data
PATH ANALYSIS Discover Patterns in Rows of Sequential Data
TEXT ANALYSIS Derive Patterns and Extract Features in Textual Data
STATISTICAL ANALYSIS High-Performance Processing of Common Statistical Calculations
SEGMENTATION Discover Natural Groupings of Data Points
MARKETING ANALYTICS Analyze Customer Interactions to Optimize Marketing Decisions
DATA TRANSFORMATION Transform Data for More Advanced Analysis
Confidential and proprietary. Copyright © 2013 Teradata Corporation. 30
Unified Big Data Analytics Architecture Integrated Analytics and Navigation
BI Tools, SQL, ETL
Multi-Structured Data
Unstructured Data
TERADATA IDW BIG ANALYTICS APPLIANCE
Revenue Social Media
Discovery Platform
Facebook Twitter
Sentiments Behavior
Unified Big Analytics Architecture
Iterative Information Discovery
Operationalized Analytics
Best Decision Possible
Confidential and proprietary. Copyright © 2013 Teradata Corporation. 31
Teradata Aster Big Analytics Appliance Solution Value Add
SQL BI Tools
Analytic SQL Apps
Hadoop Tools
• Processing, storage, and networking designed for Big Data workloads
• 40 GB/s InfiniBand network
• Pre-tuned HDFS and MapReduce parameters for Big Data workloads
• Store and manage data in Apache Hadoop or Aster Database
• Analytics Library w/ 70+ functions • SQL interface to MapReduce and
Hadoop
• Supports standard BI and ETL tools • Use Hadoop tools like Hive and Pig
• Single vendor for lowest TCO • Common system management tools
Aster Database InfiniBand (40 GB/s) Interconnect Fabric
Big Analytics Appliance Hardware
Aster MapReduce Portfolio of Functions
SQL SQL-MapReduce
Com
mon
Man
agem
ent,
Tr
oub
lesh
ooti
ng
, an
d S
up
por
t
NEW
NEW
NEW
Hive, Pig, …
SQL-H NEW
Confidential and proprietary. Copyright © 2013 Teradata Corporation. 32
ESG Benchmark Report Summary Third Party Validation of Aster and Hadoop “Fit”
FULL REPORT AVAILABLE AT www.asterdata.com/esg
RESULTS
Scope • Identical hardware for Aster and Hadoop • Clickstream, sentiment, and traditional retail data • Compare “time to insight” and “time to develop”
Hadoop MapReduce
Aster SQL-MapReduce
32 Hours
6 Hours
Discovery Process: Aster
5x Faster
Analytics: Aster 35x Faster (range: 4–416x)
Development: Aster
3x Faster
Loading: Hadoop
1.8x Faster
Transforms: Hadoop
1.3x Faster
Aster 5x Faster Discovery Cycle-Time (Development + Execution Time)
Confidential and proprietary. Copyright © 2013 Teradata Corporation. 33
Comparing Advanced Analytic Development and Execution Example: Determine Spikes In Hourly Pageviews
Development Time: 4 hours Execution Time: 149 seconds
Development Time: 1 hour (4x faster) Execution Time: 3 seconds (50x faster)
Apache Hadoop Teradata Aster • Write Java MR job to group records by pagename
and find all pages <100 pageviews/hr • Sort by the yy/mm/dd and hour fields
• Java reduce phase to place all same-keyed records into temporary arrays
• Compute counts for low/high/low hourly page views
• Create custom partitioner • Create custom grouping comparator • Create custom key comparator
• Execute each Mapper and Reducer • Multiple passes of data
• Save output to flat files making it unstructured, • No relational semantics and preventing use of
DB interfaces (e.g. ODBC/JDBC) • Retrieve results with other tools (e.g., SSH/FTP)
• Use Aster nPath • Input parameters in SQL as regular expressions
• Single Pass of the data • SQL handles group-by, counts, sorts • MapReduce perform regular pattern matching
over a sequence of rows
• Outputs written to relational table • Use SQL or BI tools to visualize results
1
2
3+
Execute
5
1
Execute
3
“By using SQL-MapReduce, Aster takes fewer steps to develop analytics”
“This is also why the execution time in Aster is much faster.”
“Rather than using MapReduce processing for each step in the analysis, SQL is used in place of a Map (or Reduce) phase and MapReduce is used only in steps that cannot be expressed in SQL.”
“Map or Reduce requires data shuffling and produces higher latency than SQL”
Source: Enterprise Strategy Group, Lab Validation Report, September 2012
Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Teradata Aster Big Analytics Appliance—Key Innovations
Confidential and proprietary. Copyright © 2013 Teradata Corporation. 35
Aster SQL-H™ A Business User’s Bridge to Analyze Hadoop Data
Aster SQL-H Gives Analysts and Data Scientists a Better Way to Analyze Data Stored in Hadoop • Allow standard ANSI SQL access to
Hadoop data
• Leverage existing BI tool and enable self service
• Enable 50+ prebuilt SQL-MapReduce Apps and IDE
Hadoop Layer: HDFS
Pig
Hive
Hadoop MR
Aster: SQL-H
HCatalog
Dat
a
Dat
a Fi
ltering
NEW
Confidential and proprietary. Copyright © 2013 Teradata Corporation. 36
• Tightly aligned with core Apache code lines
• All code committed back to open source
• Engineered integration with Teradata Viewpoint and Ambari
• HCatalog - centralized metadata services for easy data sharing
• Dependable full stack high availability
• Capacity scheduler for better multi-tenancy
• Intuitive graphical data integration tools
The ONLY 100% open source data platform for Hadoop
Hortonworks Data Platform Enterprise-Ready Hadoop
Confidential and proprietary. Copyright © 2013 Teradata Corporation. 37
Common Management Console for Aster, Teradata and Apache Hadoop
Aster-Specific Portlets • Aster Node
Monitoring • Aster Completed
Processes
Trend/ Visualization Portlets • Capacity
Heat Map • Metrics Graph • Metrics Analysis
Query Portlets • Query Monitor
Admin Portlets • Teradata System • Roles Manager
Other Portlets • System Health • Canary queries • Aster Alerting
Teradata Viewpoint Integration Easier, Faster, and Better System Management
Confidential and proprietary. Copyright © 2013 Teradata Corporation. 38
Teradata Vital Infrastructure (TVI) Integrated hardware & software solution for systems
management PROACTIVE RELIABILITY, AVAILABILITY, AND MANAGEABILITY
1U server virtualizes system and cabinet management software Server Management VMS • Cabinet Management Interface Controller (CMIC) • Service Work Station (SWS) • Automatically installed on base/first cabinet
VMS allows full rack solutions without additional cabinet for traditional SWS
Eliminates need for expansion racks, reducing customers’ floor space & energy costs
Supports Teradata hardware and Aster/Hadoop software
TVI Support for Aster and Hadoop
62–70% of Incidents Discovered through TVI
Confidential and proprietary. Copyright © 2012 Teradata Corporation.
How Can You Get Started? Aster Express
Confidential and proprietary. Copyright © 2013 Teradata Corporation. 40
Making it easy to try Aster Big Analytics Solutions Aster Express, Aster Live, Aster Big Analytics Appliance
Aster Express Aster Live
Aster Big Analytics
Appliance
Confidential and proprietary. Copyright © 2013 Teradata Corporation. 41
Aster Express Tutorials Make it Easy to Start www.asterdata.com/asterexpress
Confidential and proprietary. Copyright © 2013 Teradata Corporation. 42
Teradata Aster Big Analytics Appliance Summary Bring Big Data to Life with Big Analytics & Discovery
INDUSTRY’S FIRST UNIFIED BIG ANALYTICS APPLIANCE
UNIFIED INTERFACES FOR ITERATIVE SQL AND MAPREDUCE ANALYTICS
TERADATA-TRUSTED RELIABILITY, AVAILABILITY & MANAGEABILITY
EASY TO DEPLOY, MANAGE & USE
Get Started Now! asterdata.com/AsterExpress
Confidential and proprietary. Copyright © 2013 Teradata Corporation. 44
When to Use Which? The best approach by workload and data type Processing as a Function of Schema Requirements and Stage of Data Pipeline
Low Cost Storage and Fast Loading
Data Pre-Processing,
Refining, Cleansing
“Simple math at scale”
(Score, filter, sort, avg., count...)
Joins, Unions,
Aggregates
Analytics (Iterative and data mining)
Reporting
Stable Schema
Evolving Schema
Aster (SQL +
MapReduce Analytics)
Format, No Schema Hadoop Hadoop Hadoop Aster Aster
Aster (MapReduce Analytics)
Teradata/ Hadoop Teradata Teradata Teradata Teradata Teradata
Hadoop Aster / Hadoop
Aster / Hadoop Aster Aster Aster
Hadoop Hadoop Hadoop Aster Aster Aster
Financial Analysis, Ad-Hoc/OLAP Enterprise-Wide BI and Reporting
Spatial/Temporal Active Execution
Interactive Data Discovery Web Clickstream, Set-Top Box Analysis
CDRs, Sensor Logs, JSON
Social Feeds, Text, Image Processing Audio/Video Storage and Refining
Storage and Batch Transformations
Confidential and proprietary. Copyright © 2013 Teradata Corporation. 45
When to Use Which? The best approach by workload and data type Processing as a Function of Schema Requirements and Stage of Data Pipeline
Low Cost Storage and Fast Loading
Data Pre-Processing,
Refining, Cleansing
“Simple math at scale”
(Score, filter, sort, avg., count...)
Joins, Unions,
Aggregates
Analytics (Iterative and data mining)
Reporting
Stable Schema
Evolving Schema
Aster (SQL +
MapReduce Analytics)
Format, No Schema Hadoop Hadoop Hadoop Aster Aster
Aster (MapReduce Analytics)
Teradata/ Hadoop Teradata Teradata Teradata Teradata Teradata
Hadoop Aster / Hadoop
Aster / Hadoop Aster Aster Aster
Hadoop Hadoop Hadoop Aster Aster Aster
Confidential and proprietary. Copyright © 2013 Teradata Corporation. 46
Ease of Development and Reuse Analytic Foundation : 70+ out-of-the-box modules Modules Business-ready SQL-MapReduce Functions
Path Analysis Discover patterns in rows of sequential data
• nPath: complex sequential analysis for time series analysis and behavioral pattern analysis
• Sessionization: identifies sessions from time series data in a single pass over the data
• Attribution: operator to help ad networks and websites to distribute “credit”
Statistical Analysis High-performance processing of common statistical calculations
• Histogram: function to provide capability of generating • Decision Trees: Native implementation of parallel random forests. • Approximate percentiles and distinct counts: calculate
percentiles and counts within specific variance • Correlation: calculation that characterizes the strength of the
relation between different columns • Regression: performs linear or logistic regression between an output
variable and a set of input variables • Averages: calculate moving, weighted, exponential or volume-
weighted averages over a window of data
Relational Analysis Discover important relationships among data
• Graph analysis: finds shortest path from a distinct node to all other nodes in a graph
• Tokenization: splits strings into individual words to assist text processing
Confidential and proprietary. Copyright © 2013 Teradata Corporation. 47
Modules SQL-MapReduce Analytic Functions
Text Analysis Derive patterns in textual data
• Text Processing: counts occurrences of words, identifies roots, & tracks relative positions of words & multi-word phrases
• Text Partition: analyzes text data over multiple rows • Levenshtein Distance: computes the distance between two words
Cluster Analysis Discover natural groupings of data points
• k-Means: clusters data into a specified number of groupings • Canopy: partitions data into overlapping subsets within which k-
means is performed • Minhash: buckets highly-dimensional items for cluster analysis • Basket analysis: creates configurable groupings of related items
from transaction records in single pass • Collaborative Filter: predicts the interests of a user by collecting
interest information from many users
Data Transformation Transform data for more advanced analysis
• Unpack: extracts nested data for further analysis • Pack: compress multi-column data into a single column • Antiselect: returns all columns except for specified column • Multicase: case statement that supports row match for multiple
cases
Ease of Development and Reuse Analytic Foundation : 50+ out-of-the-box modules
Twitter Tag: #briefr
The Briefing Room
Analyst: Robin Bloor
Perceptions & Questions
The Bloor Group
The Bloor Group
Big Data Is About Analytics
DATA AIN’T WHAT IT USED TO BE Machine generated data (logs)
Web data
Social media data
Public data services
Supply chain data
Real-time data flows
THE ANALOGY OF STRIP-MINING IS RELEVANT BECAUSE THE SCALE OF DATA
ANALYTICS HAS EXPANDED DRAMATICALLY
The Bloor Group
The Data Analytics Issue
The Bloor Group
What Hadoop Is NOT
A MULTIUSER HIGHLY TUNED ENGINE
AN ANALYTICS PLATFORM
A SOLUTION
A USEFUL, FLEXIBLE AND VERY ECONOMIC DATA STORE – WITH
PLUG-INS
But it IS:
The Bloor Group
About Data Analytics
It is all about TIME TO INSIGHT – as long as that is followed by action
Fast time to insight requires FLEXIBLE management of high performance data flows -
for the benefit of the data analyst
The data analyst needs to be able to MARSHAL the data
Then maybe, just maybe, he will deserve the title of DATA SCIENTIST
The Bloor Group
Clearly the Teradata Aster Big Analytics Appliance is a powerful data flow engine, so:
! How does Aster Data achieve its performance lift with MapReduce?
! How is it most usually deployed?
! Can it do data cleansing in flight?
! Can it perform analytic tasks?
The Bloor Group
! Why an appliance? What is gained and what is sacrificed?
! Which sectors/businesses do you expect to be able to make best use of this technology?
! Which companies/products do you regard as competitors (either direct or near)?
! Which companies/products do you partner with?
! How does the appliance fit in the cloud?
Twitter Tag: #briefr
The Briefing Room
Twitter Tag: #briefr
The Briefing Room
Upcoming Topics
This month: Big Data
February: Analytics
March: Open Source
April: Intelligence
www.insideanalysis.com
Twitter Tag: #briefr
The Briefing Room
Thank You for Your
Attention