MapR 5.2: Getting More Value from the MapR Converged Data Platform

40
© 2016 MapR Technologies 1 © 2016 MapR Technologies MapR 5.2: Getting More Value from the MapR Converged Data Platform

Transcript of MapR 5.2: Getting More Value from the MapR Converged Data Platform

Page 1: MapR 5.2: Getting More Value from the MapR Converged Data Platform

© 2016 MapR Technologies 1 © 2016 MapR Technologies

MapR 5.2: Getting More Value from the

MapR Converged Data Platform

Page 2: MapR 5.2: Getting More Value from the MapR Converged Data Platform

© 2016 MapR Technologies 2

Today’s Presenters

Nitin Bandugula

Director, Professional Services

Ankur Desai

Sr. Manager, Platform and Products

Page 3: MapR 5.2: Getting More Value from the MapR Converged Data Platform

© 2016 MapR Technologies 3

Today’s Agenda

• Top reasons to upgrade to MapR 5.2

• Latest ecosystem support in MapR 5.2

• The 5.2 Step Up program

• Q&A

Page 4: MapR 5.2: Getting More Value from the MapR Converged Data Platform

© 2016 MapR Technologies 4

5 Reasons to Step Up to MapR 5.2

1. New monitoring and management capabilities with the Spyglass

Initiative

2. New platform services in the MapR Converged Data Platform

including real-time streaming

3. MapR Ecosystem Pack to accelerate project updates

4. Latest Ecosystem updates

5. End-of-maintenance for prior releases

Page 5: MapR 5.2: Getting More Value from the MapR Converged Data Platform

© 2016 MapR Technologies 5 © 2016 MapR Technologies 5

Open Source Engines & Tools Commercial Engines & Applications

Enterprise-Grade Platform Services

Data

P

roc

es

sin

g

Web-Scale Storage MapR-FS MapR-DB

Search and

Others

Real Time Unified Security Multi-tenancy Disaster Recovery Global Namespace High Availability

MapR Streams

Cloud and Managed Services

Search and

Others

Un

ified

Man

ag

em

en

t an

d M

on

itorin

g

Search and

Others

Event Streaming Database

Custom Apps

HDFS API POSIX, NFS HBase API JSON API Kafka API

MapR Converged Data Platform

Page 6: MapR 5.2: Getting More Value from the MapR Converged Data Platform

© 2016 MapR Technologies 6 © 2016 MapR Technologies

Project Spyglass

Page 7: MapR 5.2: Getting More Value from the MapR Converged Data Platform

© 2016 MapR Technologies 7

MapR Vision: Maximizing User/Operator Productivity

Deep

Visibility

Another

sample Easy

Management

Full

Control

Page 8: MapR 5.2: Getting More Value from the MapR Converged Data Platform

© 2016 MapR Technologies 8

The MapR Spyglass Initiative

• New approach for increasing user and administrator productivity

– Comprehensive, open, extensible

• Simplifies the management of growing big data deployments

• Starts with 5.2 release

– Phase 1 – MapR Monitoring

– Initial focus on operational visibility

• Helps community innovate faster

– Extensive use of open source visualization and dashboarding tools

Page 9: MapR 5.2: Getting More Value from the MapR Converged Data Platform

© 2016 MapR Technologies 9

Spyglass Initiative Phase 1 - MapR Monitoring

Empower administrators with cluster

monitoring capabilities, including

metric and log collection from nodes,

services, and jobs, with dashboards to

display information in a useful way.

Converged

Customizable

Extensible

Page 10: MapR 5.2: Getting More Value from the MapR Converged Data Platform

© 2016 MapR Technologies 10

Collection Visualization Aggregation &

Storage

MapR Monitoring Architecture

Future

Data Sources

Log Shippers

Metrics

Collectors

Alerting

Node

Environmentals

(CPU, Mem, I/O)

Service

Daemons

(YARN, Drill,

Hive, etc.)

MapR Control System

Page 11: MapR 5.2: Getting More Value from the MapR Converged Data Platform

© 2014 MapR Technologies 11

Project Spyglass – Monitoring All You Care About

Node/Infrastructure Monitoring

• Global Aggregates (Average, Min, Max)

Charts (e.g. CPU, Disk utilization)

• Per-node charts (e.g. I/O Throughput

by disk)

• MFS read/writes and throughput

• DB puts, gets, scans and cache metrics

Page 12: MapR 5.2: Getting More Value from the MapR Converged Data Platform

© 2014 MapR Technologies 12

Project Spyglass – Monitoring All You Care About

Node/Infrastructure Monitoring

• Global Aggregates (Average, Min, Max)

Charts (e.g. CPU, Disk utilization)

• Per-node charts (e.g. I/O Throughput

by disk)

• MFS read/writes and throughput

• DB puts, gets, scans and cache metrics

Cluster Space Utilization Monitoring

• Cluster wide storage utilization

• Storage Utilization Trend

• Utilization per volume and per accountable

entity (data, volume, snapshot and total size)

Page 13: MapR 5.2: Getting More Value from the MapR Converged Data Platform

© 2014 MapR Technologies 13

Project Spyglass – Monitoring All You Care About

Node/Infrastructure Monitoring

• Global Aggregates (Average, Min, Max)

Charts (e.g. CPU, Disk utilization)

• Per-node charts (e.g. I/O Throughput

by disk)

• MFS read/writes and throughput

• DB puts, gets, scans and cache metrics

Cluster Space Utilization Monitoring

• Cluster wide storage utilization

• Storage Utilization Trend

• Utilization per volume and per accountable

entity (data, volume, snapshot and total size)

YARN/MR Application Monitoring

• Global YARN trend graphs

• Containers - Pending, Active

• vCores & RAM - Allocated & Used

• Per Queue charts - containers, vCores, RAM

Page 14: MapR 5.2: Getting More Value from the MapR Converged Data Platform

© 2014 MapR Technologies 14

Project Spyglass – Monitoring All You Care About

Node/Infrastructure Monitoring

• Global Aggregates (Average, Min, Max)

Charts (e.g. CPU, Disk utilization)

• Per-node charts (e.g. I/O Throughput

by disk)

• MFS read/writes and throughput

• DB puts, gets, scans and cache metrics

Cluster Space Utilization Monitoring

• Cluster wide storage utilization

• Storage Utilization Trend

• Utilization per volume and per accountable

entity (data, volume, snapshot and total size)

YARN/MR Application Monitoring

• Global YARN trend graphs

• Containers - Pending, Active

• vCores & RAM - Allocated & Used

• Per Queue charts - containers, vCores, RAM

Service Daemon Monitoring

• Per-service charts with for (CPU Usage by

type, Memory)

• Centralized, searchable logs

• MapR core and ecosystem services

(includes YARN, Drill and Spark)

Page 15: MapR 5.2: Getting More Value from the MapR Converged Data Platform

© 2016 MapR Technologies 15

Customizable

Dashboards

for Visualizing Metrics

Log

Analytics

Page 16: MapR 5.2: Getting More Value from the MapR Converged Data Platform

© 2016 MapR Technologies 16

Destination to Learn and Collaborate

Blog about topics and ideas

Share code snippets and dashboards

View demos, tutorials, and videos

Engage in use case discussion/development

Page 17: MapR 5.2: Getting More Value from the MapR Converged Data Platform

© 2016 MapR Technologies 17

Dashboards are defined with JSON

and easy to export and import in

Grafana and Kibana

Extend/Integrate using REST API

The Exchange

Page 18: MapR 5.2: Getting More Value from the MapR Converged Data Platform

© 2016 MapR Technologies 18

Dashboards

can be viewed

on mobile

devices.

Page 19: MapR 5.2: Getting More Value from the MapR Converged Data Platform

© 2016 MapR Technologies 19

Summary

● Data collection and storage infrastructure (packaged

and supported)

○ Collection/storage of metrics & logs across node, storage,

services

● Visualization dashboard (Driven via community)

○ Sample dashboards for Grafana & Kibana

5.2 - Spyglass 1.0 GA

CUSTOMIZABLE, shareable and mobile-ready dashboards

CONVERGED monitoring with deep search

EXTENSIBLE and easy to integrate with REST API

Page 20: MapR 5.2: Getting More Value from the MapR Converged Data Platform

© 2016 MapR Technologies 20 © 2016 MapR Technologies

MapR Streams

Page 21: MapR 5.2: Getting More Value from the MapR Converged Data Platform

© 2016 MapR Technologies 21

MapR Streams: Enabling Continuous Data Processing

To enable continuous,

globally scalable streaming of

event data, allowing developers to

create real-time applications

that their business can depend on.

Converged

Continuous

Global

Page 22: MapR 5.2: Getting More Value from the MapR Converged Data Platform

© 2016 MapR Technologies 22

MapR Streams: Publish-subscribe Event Streaming System for Big Data

Producers publish billions of

messages/sec to a topic in a stream.

Guaranteed, immediate delivery

to all consumers.

Standard real-time API (Kafka).

Integrates with Spark Streaming,

Storm, Apex, and Flink

Direct data access (OJAI API) from

analytics frameworks.

To

pi

c

Stream

Producers

Remote sites and consumers

Batch analytics

Topic

Replication

Consumers

Consumers

Page 23: MapR 5.2: Getting More Value from the MapR Converged Data Platform

© 2016 MapR Technologies 23

MapR Streams: Building Faster and Simpler Apps

Simpler and

Faster

Architecture

• Converged platform with file storage and

database reduces data movement, data latency,

hardware cost, and administration cost

• Event streaming and stream processing in the

same cluster enables faster processing

• Unified security framework with files and database

tables reduces administration cost around setting

up and enforcing security policies

• Multi-tenant - topic isolation, quotas, data

placement control allows multiple isolated

streaming applications to run on the same cluster

reducing hardware cost and data movement

Page 24: MapR 5.2: Getting More Value from the MapR Converged Data Platform

© 2016 MapR Technologies 24

Global

• Global data and metadata replication enables

easier and reliable disaster recovery

• Active/active replication allows for cross-

datacenter producer & consumer failover to ensure

business continuity

• One unified view of all data created and distributed

across the globe

MapR Streams: Building Faster and Simpler Apps

Page 25: MapR 5.2: Getting More Value from the MapR Converged Data Platform

© 2016 MapR Technologies 25

Scalable.

• Ingest more events to enable faster insights

• Hold on to events longer to enable deeper insights

• Develop app once and apply to short & long-term

data (i.e. run analysis on 15-days data AND 1-year

data using same application)

MapR Streams: Building Faster and Simpler Apps

Page 26: MapR 5.2: Getting More Value from the MapR Converged Data Platform

© 2016 MapR Technologies 26 © 2016 MapR Technologies

MapR Ecosystem Pack

Page 27: MapR 5.2: Getting More Value from the MapR Converged Data Platform

© 2016 MapR Technologies 27

Industry-leading decoupling model of platform from open source projects

With MapR Ecosystem Pack (MEP), customers get:

• Continued quick updates of fast-changing projects

• Continued decoupling of projects from platform to allow updates based

on customer’s timeframes

• Monthly access to bug fixes

• Quarterly MEP version updates with complete interoperability across all

projects

• Improved version upgrade experience for all platform and project

updates

MapR Ecosystem Pack: Accelerate Project Updates

Page 28: MapR 5.2: Getting More Value from the MapR Converged Data Platform

© 2016 MapR Technologies 28 © 2016 MapR Technologies

Ecosystem Updates

Page 29: MapR 5.2: Getting More Value from the MapR Converged Data Platform

© 2016 MapR Technologies 29

5.2 Ecosystem Support These are the only component version changes in MEP 1.0 from 5.2 release date

and all of these have been out for 5.1 already.

Eco on 5.1 today MEP 1.0 on 5.2

Component Released with 5.1 Subsequently released for

5.1

Drill 1.4 1.6 1.6

Spark 1.5.2 1.6.1 1.6.1 (2.0 in dev

preview)

Impala 2.2.0 2.5 2.5

Storm 0.10.0 0.10.1 0.10.1

Mahout 0.11.2 0.12.2 0.12.2

Page 30: MapR 5.2: Getting More Value from the MapR Converged Data Platform

© 2016 MapR Technologies 30

Converging SQL and JSON with Apache Drill 1.6

• Flexible and operational analytics on NoSQL

– MapR-DB plugin allows analysts to perform SQL queries directly on JSON data in MapR-DB tables

– Pushdown capabilities provide optimal interactive experience

• Enhanced query performance

– Provides better query performance via partition pruning, metadata caching and other optimizations

– Delivers up to 10-60X performance gains in query planning compared to the previous releases of Drill

• Better memory management

– Delivers greater stability and scale which enables customers to run not only larger but also more SQL

workloads on a MapR cluster

• Improved integration with visualization tools like Tableau

– Introduces client impersonation for end-to-end security from the visualization tool to data in Hadoop.

– Enhanced SQL Window functions

Page 31: MapR 5.2: Getting More Value from the MapR Converged Data Platform

© 2016 MapR Technologies 31

What’s New in Spark 2.0? • Structured Streaming with Spark SQL

– The ability to perform interactive queries against live streaming data.

– Output can now be aggregated in a stream for continuous applications.

– Pre-computation of analytics in a continuous fashion can occur as the data is generated

• Whole Stage Code-gen

– Provided by the second-generation Tungsten engine.

– Eliminates the need for multiple JVM calls by flattening SQL queries into one single

function evaluated as bytecode at runtime.

• Dataframe API’s

– Runs on the same engine as SparkSQL.

– Allows access to data from a variety of different data sources.

– Can run database-like operations or allow for passing in custom code.

Page 32: MapR 5.2: Getting More Value from the MapR Converged Data Platform

© 2016 MapR Technologies 32 © 2016 MapR Technologies

End-of-Maintenance for 4.x and Continuing Quality Improvements

Page 33: MapR 5.2: Getting More Value from the MapR Converged Data Platform

© 2016 MapR Technologies 33

End-of-Maintenance for Prior Releases

• 3.x end-of-maintenance this

past February

• 4.x end-of-maintenance

coming up in January 2017

http://maprdocs.mapr.com/home/#InteropMatrix/r_release_dates.html

Page 34: MapR 5.2: Getting More Value from the MapR Converged Data Platform

© 2016 MapR Technologies 34

Continuing Quality Improvements

Plus several hundred community bug fixes across all ecosystem components along with Hadoop 2.7 Critical and Blocker fixes

OS upgrades for RHEL, CentOS, Ubuntu and SUSE

Java 1.8 support

Plus strategic partner certifications

Release Customer Reported Fixes

Cumulative

4.0.1 52

4.0.2 135 (83 new)

4.1 187 (52 new)

5.0 248 (61 new)

5.1 361 (113 new)

5.2 454 (93 new)

Page 35: MapR 5.2: Getting More Value from the MapR Converged Data Platform

© 2016 MapR Technologies 35 © 2016 MapR Technologies

Step-up Program for 5.2

Page 36: MapR 5.2: Getting More Value from the MapR Converged Data Platform

© 2016 MapR Technologies 36

Professional Services

• Installation

• Migrations

• SLA Plans

• Best Practices

• Performance

Tuning

Core Platform

Services

IT/ Infrastructure

Converged Platform

Linux

Networking

Data Center

Storage

Operations

Big Data

Workflows

• Hive/Pig/Spark

• Oozie/Sqoop

• Flume

• MapR-DB/HBase

• Data Pipeline

• MapR Streams

BI / DBA

BI / ETL / Reporting

Scripting / Java

Hadoop MR

Eco Projects

(HBase, Hive, …)

Solution

Design

• HBase/MapR-DB

• Map/Reduce

• Application

Development

• Integration

Development

Java

Hadoop Developer

Architectural Design

Advanced

Analytics

• Use case

Discovery

• Use case

Modeling

• POC

• Workshops

Modeler / Analyst

PhD

Statistics/Math

MatLab / R / SAS

Scripting / Java

BI / ETL / Reporting

Data Engineering Data Science

ENGAGEMENTS

SKILLS

Page 37: MapR 5.2: Getting More Value from the MapR Converged Data Platform

© 2016 MapR Technologies 37

MapR 5.2 Upgrade Process Documentation

• MapR Documentation is available to help you upgrade:

maprdocs.mapr.com/home/UpgradeGuide/Upgrade-Guide.html

• The documentation walks you through the following steps:

– Planning the Upgrade: Determine the upgrade method

– Preparing to Upgrade: Prepare the running cluster for upgrade

– Upgrading the Cluster: With or Without the MapR installer

– Finishing the Upgrade: Complete the post-upgrade steps

– Upgrading MapR Clients: Perform steps to upgrade the MapR client

Page 38: MapR 5.2: Getting More Value from the MapR Converged Data Platform

© 2016 MapR Technologies 38

MapR 5.2 Step-Up Program with MapR PS

• MapR Professional Services – Experience from 100s of engagements and

– Deep technical expertise in the Hadoop ecosystem

• MapR PS team will help you upgrade from 3.x or 4.x to 5.2 within a few weeks

• Service Includes – Environment Assessment – admin nodes, jobs, latencies etc.

– Cluster Health Check

– Suggest the best upgrade path- manual / installer etc.

– Upgrade the cluster to the latest version of the platform

– Upgrade the cluster to the latest eco-system packages

– Post-Upgrade Check

– Evaluate existing workflow and make recommendations on how to leverage YARN framework

• Provide a generic example of how YARN implementation is done

Page 39: MapR 5.2: Getting More Value from the MapR Converged Data Platform

© 2016 MapR Technologies 39

Step-Up Program Details

# Nodes Upgrade

Package

PS

Engagement

< 25 nodes Core + Hive, Pig & Drill upgrade 1 week

25 - 75 nodes Core + Hive, Pig & Drill upgrade 2 weeks

75 - 200 nodes Core + Hive, Pig & Drill upgrade 3 weeks

> 200 nodes Custom Scoping Custom

Add-on Options

1 HBase Upgrade 1 additional week

2 Remaining Ecosystem Upgrade 1 additional week

3 Cluster preparation for YARN 1 additional week

4 App migration to YARN (MRv2) Custom

• Applicable for both 3.x and 4.x upgrades

• Up to 2 applications will be recompiled

• During the cluster health checks, reorganization of the cluster services (Zookeeper, CLDB, etc.) will be

evaluated based on best practices

Page 40: MapR 5.2: Getting More Value from the MapR Converged Data Platform

© 2016 MapR Technologies 40

Q & A Engage with us!

• Upgrade documentation o maprdocs.mapr.com/home/UpgradeGuide/Upgrade-Guide.html

• Try MapR Streams and MapR-DB on-prem, cloud, or VM sandbox o mapr.com/download

• Get community support from experts • community.mapr.com