Impala Unlocks Interactive BI on Hadoop

20
Impala unlocks Interactive BI on Hadoop with MicroStrategy Justin Erickson | Cloudera | Senior Product Manager Jochen Demuth | MicroStrategy | Director, Partner Engineering May 2013

Transcript of Impala Unlocks Interactive BI on Hadoop

Impala unlocks Interactive BI on Hadoop with MicroStrategyJustin Erickson | Cloudera | Senior Product ManagerJochen Demuth | MicroStrategy | Director, Partner EngineeringMay 2013

©2013 Cloudera, Inc. All Rights Reserved.

2

Agenda

• Why Impala?• Architectural Overview• Real-World Use Cases• Interactive Analytics with MicroStrategy• Taking Big Data Out of Isolation

3

Why Hadoop?

• Scalability• Simply scales just by adding nodes• Local processing to avoid network bottlenecks

• Flexibility• All kinds of data (blobs, documents, records, etc)• In all forms (structured, semi-structured, unstructured)• Store anything then later analyze what you need

• Efficiency• Cost efficiency (<$1k/TB) on commodity hardware• Unified storage, metadata, security (no duplication or synchronization)

©2013 Cloudera, Inc. All Rights Reserved.

4

What’s Impala?

• Interactive SQL• Typically 5-65x faster than Hive (observed up to 100x faster)• Responses in seconds instead of minutes (sometimes sub-second)

• Nearly ANSI-92 standard SQL queries with Hive SQL• Compatible SQL interface for existing Hadoop/CDH applications• Based on industry standard SQL

• Natively on Hadoop/HBase storage and metadata• Flexibility, scale, and cost advantages of Hadoop• No duplication/synchronization of data and metadata• Local processing to avoid network bottlenecks

• Separate runtime from MapReduce• MapReduce is designed and great for batch• Impala is purpose-built for low-latency SQL queries on Hadoop

©2013 Cloudera, Inc. All Rights Reserved.

©2013 Cloudera, Inc. All Rights Reserved.

5

Benefits of Impala

More & Faster Value from “Big Data” BI tools impractical on Hadoop before Impala Move from 10s of Hadoop users per cluster to 100s of SQL users No delays from data migration

Flexibility Query across existing data Select best-fit file formats (Parquet, Avro, etc.) Run multiple frameworks on the same data at the same time

Cost Efficiency Reduce movement, duplicate storage & compute 10% to 1% the cost of analytic DBMS

Full Fidelity Analysis No loss from aggregations or fixed schemas

©2013 Cloudera, Inc. All Rights Reserved.

6

Impala and Hive

Shares Everything Client-Facing Metadata (table definitions) ODBC/JDBC drivers SQL syntax (Hive SQL) Flexible file formats Machine pool Hue GUI

But Built for Different Purposes Hive: runs on MapReduce and ideal for batch

processing Impala: native MPP query engine ideal for

interactive SQL

Storage

Integration

Resource Management

Met

adat

a

HDFS HBase

TEXT, RCFILE, PARQUET, AVRO, ETC. RECORDS

HiveSQL Syntax Impala

SQL Syntax +Compute FrameworkMapReduce

Compute Framework

BatchProcessing

Interactive

SQL

©2013 Cloudera, Inc. All Rights Reserved.

7

Not All SQL on Hadoop is Created Equal

Batch MapReduceMake MapReduce faster

Slow, still batch

Remote QueryPull data from HDFS over the network to the DW

compute layer

Slow, expensive

Siloed DBMSLoad data into a

proprietary database file

Rigid, siloed data,slow ETL

ImpalaNative MPP query engine

that’s integrated into Hadoop

Fast, flexible, cost-effective

$

©2013 Cloudera, Inc. All Rights Reserved.

8

Our Design Strategy

Storage

Integration

Resource Management

Met

adat

a

BatchProcessingMAPREDUCE,

HIVE & PIG

…Interactive

SQLIMPALA

Machine

LearningMAHOUT, DATAFU

HDFS HBase

TEXT, RCFILE, PARQUET, AVRO, ETC. RECORDS

Engines

One pool of data

One metadata model

One security framework

One set of system resources

An Integrated Part of the Hadoop System

©2013 Cloudera, Inc. All Rights Reserved.

9

Impala Use Cases

Interactive BI/analytics on more data

Asking new questions

Query-able archive w/ full fidelity

Data processing with tight SLAs

Cost-effective, ad hoc query environment that offloads the data warehouse for:

©2013 Cloudera, Inc. All Rights Reserved.

10

Global Financial Services Company

Saving 90% on incremental EDW spend &improving performance by 5x

Offload data warehouse for query-able archive

Store decades of data cost-effectively

Process & analyze on the same system

Improve capabilities through interactive query on more data

©2013 Cloudera, Inc. All Rights Reserved.

11

Six3 Systems

Boosting performance by 20X for mission-critical, real-time cyber security

Analyze unstructured data with flexibility & real-time response

Integrate with existing desktop & BI tools

Deploy in minutes with Cloudera Manager

©2013 Cloudera, Inc. All Rights Reserved.

12

Expedia

Implementing self-service BI on big data, reducing data latency by 50%

Offload data warehouse for archiving, ETL & analytics

Unify IT environment

Continuously ingest & analyze at scale

Drive greater usability & adoption of big data stack

CONFIDENTIALThe Information Contained In This Presentation Is Confidential And Proprietary To MicroStrategy. The Recipient Of This Document Agrees That They Will Not Disclose Its Contents ToAny Third Party Or Otherwise Use This Presentation For Any Purpose Other Than An Evaluation Of MicroStrategy's Business Or Its Offerings. Reproduction or Distribution Is Prohibited.14

About MicroStrategyInnovator and Leader In Interactive BI

Company

• Top independent analytics software platform vendor

• 20+ years old, publicly traded

• Approximately $600M revenue in 2012. No debt, $200M+ cash in the bank

• Global presence with operations in 23 countries

Technology

• Long-time market leader and innovator in analytics

• Unique unitary architecture, known for high performance and scalability

• Revolutionary Cloud-based analytics services

• Innovations in mobile commerce and identity

Analysts

• Leader for six consecutive years in Gartner’s BI Magic Quadrant

• Leader in Forrester BI Self Service Wave

• #1 Ranked Mobile BI Vendor by Gartner & Dresner Advisory

• Top ranking BI vendor in the BI Scorecard

Customers

• Millions of business users

• Thousands of mission-critical applications

• Nearly 4,000 customer institutions globally across all industries and government

• Customers range from Global 500 giants like Chevron and Carrefour to cutting edge technology innovators like eBay and LinkedIn

CONFIDENTIALThe Information Contained In This Presentation Is Confidential And Proprietary To MicroStrategy. The Recipient Of This Document Agrees That They Will Not Disclose Its Contents ToAny Third Party Or Otherwise Use This Presentation For Any Purpose Other Than An Evaluation Of MicroStrategy's Business Or Its Offerings. Reproduction or Distribution Is Prohibited.15

Retail

Financial Services

Communications

Other Major Companies

Innovators and Leaders Worldwide

4 of the Top 5 Global Retailers

Manufacturing

5 of the Top 10 Automotive Companies

8 of the Top 10 Communications & Media Companies

Pharmaceuticals

7 of the Top 10 Healthcare & Life Sciences Companies

6 of the Top 10 Financial Service Companies

Government

Federal, State, and Local Government Institutions

Consumer Packaged Goods

Leading Consumer Packaged Goods Companies

Our Customers Are Leaders In All IndustriesSupporting the Most Demanding, Mission Critical BI Applications

CONFIDENTIALThe Information Contained In This Presentation Is Confidential And Proprietary To MicroStrategy. The Recipient Of This Document Agrees That They Will Not Disclose Its Contents ToAny Third Party Or Otherwise Use This Presentation For Any Purpose Other Than An Evaluation Of MicroStrategy's Business Or Its Offerings. Reproduction or Distribution Is Prohibited.16

Intuitive Interface• Interactive dashboards

• Visual Analytics

• Build once, deploy anywhere

Data Federation• Virtual model spanning

multiple data sources

High Performance• Push-down analytics

• In-memory cube acceleration

Flexible, Reliable, and Easy to Manage• High-efficiency object reuse

• Powerful SDK

• Comprehensive admin tools

MicroStrategy Analytics PlatformComprehensive Analytics Suite for Big Data

Web | Mobile | Portals | Office™

Data from Across the Enterprise

Dashboards Statements Visual Discovery

MicroStrategy Analytics Platform

Reports

Data Marts

Relational Databases

Cloudera Impalafor Hadoop

Multi Dimensional

Sources

CONFIDENTIALThe Information Contained In This Presentation Is Confidential And Proprietary To MicroStrategy. The Recipient Of This Document Agrees That They Will Not Disclose Its Contents ToAny Third Party Or Otherwise Use This Presentation For Any Purpose Other Than An Evaluation Of MicroStrategy's Business Or Its Offerings. Reproduction or Distribution Is Prohibited.17

MicroStrategy Visual Insight

• Stunning visualizations

• On-screen filtering

• Speed-of-thought in memory database

Common Use Cases

• Interactive data exploration and root cause analysis

• Dashboard creation

• Self-service BI

MicroStrategy Visual InsightInteractive Analysis, Drag-and-Drop to Build Intuitive Dashboards in Minutes

Data Marts

Relational Databases

Multi Dimensional

Sources

Data from Across the Enterprise

Cloudera Impalafor Hadoop

CONFIDENTIALThe Information Contained In This Presentation Is Confidential And Proprietary To MicroStrategy. The Recipient Of This Document Agrees That They Will Not Disclose Its Contents ToAny Third Party Or Otherwise Use This Presentation For Any Purpose Other Than An Evaluation Of MicroStrategy's Business Or Its Offerings. Reproduction or Distribution Is Prohibited.18

Combine Data from Multiple Federated SourcesTake Big Data Out of Isolation

Put Big Data analysis in context with information from federated data sources into one single dashboard

User / Departmental Data

Data Warehouse Appliances

HadoopDatabases

Relational Databases

MultidimensionalDatabases

ColumnarDatabases

1

12

2

3

2 & 3

Bring All Relevant Data

to Decision Makers,

No Matter Where It Resides

CONFIDENTIALThe Information Contained In This Presentation Is Confidential And Proprietary To MicroStrategy. The Recipient Of This Document Agrees That They Will Not Disclose Its Contents ToAny Third Party Or Otherwise Use This Presentation For Any Purpose Other Than An Evaluation Of MicroStrategy's Business Or Its Offerings. Reproduction or Distribution Is Prohibited.19

Browsers Portals Enterprise Applications

Web

Email

Email

PDF Office

DocumentsMobile

AndroidiOSBlackBerry

Build Once, Deploy AnywhereMakes Big Data Accessible to a Wider Business Audience

Build once

Deploy via any media

1

2

CONFIDENTIALThe Information Contained In This Presentation Is Confidential And Proprietary To MicroStrategy. The Recipient Of This Document Agrees That They Will Not Disclose Its Contents ToAny Third Party Or Otherwise Use This Presentation For Any Purpose Other Than An Evaluation Of MicroStrategy's Business Or Its Offerings. Reproduction or Distribution Is Prohibited.20

From Big Data to Business ValueMicroStrategy Delivers Insights on Big Data Faster

Any and AllData

World’s Most Intuitive Interface

Benchmarking

Projections

Trend Analysis

Data Summarization

RelationshipAnalysis

Relational

Multidimensional

Hadoop-based

Structured

Semi-Structured

Unstructured

Comprehensive Analytics

Cloudera Impalafor Hadoop

Shortened time-to-value for data scientist Enables self-service for the business user

• Submit questions in the Q&A panel

• Watch this webinar on-demand at http://cloudera.com

• Follow Cloudera at @Cloudera

• Follow MicroStrategy at @microstrategy

• Thank you for attending!

Learn more about the Cloudera MicroStrategy partnership

http://cloudera.com/Microstrategy.html

Download Impala http://cloudera.com/downloads

Learn more about Impala at http://cloudera.com/impala