MapR 5.2: Getting More Value from the MapR Converged Data Platform
-
Upload
mapr-technologies -
Category
Data & Analytics
-
view
191 -
download
3
Transcript of MapR 5.2: Getting More Value from the MapR Converged Data Platform
© 2016 MapR Technologies 1 © 2016 MapR Technologies
MapR 5.2: Getting More Value from the
MapR Converged Data Platform
© 2016 MapR Technologies 2
Today’s Presenters
Nitin Bandugula
Director, Professional Services
Ankur Desai
Sr. Manager, Platform and Products
© 2016 MapR Technologies 3
Today’s Agenda
• Top reasons to upgrade to MapR 5.2
• Latest ecosystem support in MapR 5.2
• The 5.2 Step Up program
• Q&A
© 2016 MapR Technologies 4
5 Reasons to Step Up to MapR 5.2
1. New monitoring and management capabilities with the Spyglass
Initiative
2. New platform services in the MapR Converged Data Platform
including real-time streaming
3. MapR Ecosystem Pack to accelerate project updates
4. Latest Ecosystem updates
5. End-of-maintenance for prior releases
© 2016 MapR Technologies 5 © 2016 MapR Technologies 5
Open Source Engines & Tools Commercial Engines & Applications
Enterprise-Grade Platform Services
Data
P
roc
es
sin
g
Web-Scale Storage MapR-FS MapR-DB
Search and
Others
Real Time Unified Security Multi-tenancy Disaster Recovery Global Namespace High Availability
MapR Streams
Cloud and Managed Services
Search and
Others
Un
ified
Man
ag
em
en
t an
d M
on
itorin
g
Search and
Others
Event Streaming Database
Custom Apps
HDFS API POSIX, NFS HBase API JSON API Kafka API
MapR Converged Data Platform
© 2016 MapR Technologies 6 © 2016 MapR Technologies
Project Spyglass
© 2016 MapR Technologies 7
MapR Vision: Maximizing User/Operator Productivity
Deep
Visibility
Another
sample Easy
Management
Full
Control
© 2016 MapR Technologies 8
The MapR Spyglass Initiative
• New approach for increasing user and administrator productivity
– Comprehensive, open, extensible
• Simplifies the management of growing big data deployments
• Starts with 5.2 release
– Phase 1 – MapR Monitoring
– Initial focus on operational visibility
• Helps community innovate faster
– Extensive use of open source visualization and dashboarding tools
© 2016 MapR Technologies 9
Spyglass Initiative Phase 1 - MapR Monitoring
Empower administrators with cluster
monitoring capabilities, including
metric and log collection from nodes,
services, and jobs, with dashboards to
display information in a useful way.
Converged
Customizable
Extensible
© 2016 MapR Technologies 10
Collection Visualization Aggregation &
Storage
MapR Monitoring Architecture
Future
Data Sources
Log Shippers
Metrics
Collectors
Alerting
Node
Environmentals
(CPU, Mem, I/O)
Service
Daemons
(YARN, Drill,
Hive, etc.)
MapR Control System
…
© 2014 MapR Technologies 11
Project Spyglass – Monitoring All You Care About
Node/Infrastructure Monitoring
• Global Aggregates (Average, Min, Max)
Charts (e.g. CPU, Disk utilization)
• Per-node charts (e.g. I/O Throughput
by disk)
• MFS read/writes and throughput
• DB puts, gets, scans and cache metrics
© 2014 MapR Technologies 12
Project Spyglass – Monitoring All You Care About
Node/Infrastructure Monitoring
• Global Aggregates (Average, Min, Max)
Charts (e.g. CPU, Disk utilization)
• Per-node charts (e.g. I/O Throughput
by disk)
• MFS read/writes and throughput
• DB puts, gets, scans and cache metrics
Cluster Space Utilization Monitoring
• Cluster wide storage utilization
• Storage Utilization Trend
• Utilization per volume and per accountable
entity (data, volume, snapshot and total size)
© 2014 MapR Technologies 13
Project Spyglass – Monitoring All You Care About
Node/Infrastructure Monitoring
• Global Aggregates (Average, Min, Max)
Charts (e.g. CPU, Disk utilization)
• Per-node charts (e.g. I/O Throughput
by disk)
• MFS read/writes and throughput
• DB puts, gets, scans and cache metrics
Cluster Space Utilization Monitoring
• Cluster wide storage utilization
• Storage Utilization Trend
• Utilization per volume and per accountable
entity (data, volume, snapshot and total size)
YARN/MR Application Monitoring
• Global YARN trend graphs
• Containers - Pending, Active
• vCores & RAM - Allocated & Used
• Per Queue charts - containers, vCores, RAM
© 2014 MapR Technologies 14
Project Spyglass – Monitoring All You Care About
Node/Infrastructure Monitoring
• Global Aggregates (Average, Min, Max)
Charts (e.g. CPU, Disk utilization)
• Per-node charts (e.g. I/O Throughput
by disk)
• MFS read/writes and throughput
• DB puts, gets, scans and cache metrics
Cluster Space Utilization Monitoring
• Cluster wide storage utilization
• Storage Utilization Trend
• Utilization per volume and per accountable
entity (data, volume, snapshot and total size)
YARN/MR Application Monitoring
• Global YARN trend graphs
• Containers - Pending, Active
• vCores & RAM - Allocated & Used
• Per Queue charts - containers, vCores, RAM
Service Daemon Monitoring
• Per-service charts with for (CPU Usage by
type, Memory)
• Centralized, searchable logs
• MapR core and ecosystem services
(includes YARN, Drill and Spark)
© 2016 MapR Technologies 15
Customizable
Dashboards
for Visualizing Metrics
Log
Analytics
© 2016 MapR Technologies 16
Destination to Learn and Collaborate
Blog about topics and ideas
Share code snippets and dashboards
View demos, tutorials, and videos
Engage in use case discussion/development
© 2016 MapR Technologies 17
Dashboards are defined with JSON
and easy to export and import in
Grafana and Kibana
Extend/Integrate using REST API
The Exchange
© 2016 MapR Technologies 18
Dashboards
can be viewed
on mobile
devices.
© 2016 MapR Technologies 19
Summary
● Data collection and storage infrastructure (packaged
and supported)
○ Collection/storage of metrics & logs across node, storage,
services
● Visualization dashboard (Driven via community)
○ Sample dashboards for Grafana & Kibana
5.2 - Spyglass 1.0 GA
CUSTOMIZABLE, shareable and mobile-ready dashboards
CONVERGED monitoring with deep search
EXTENSIBLE and easy to integrate with REST API
© 2016 MapR Technologies 20 © 2016 MapR Technologies
MapR Streams
© 2016 MapR Technologies 21
MapR Streams: Enabling Continuous Data Processing
To enable continuous,
globally scalable streaming of
event data, allowing developers to
create real-time applications
that their business can depend on.
Converged
Continuous
Global
© 2016 MapR Technologies 22
MapR Streams: Publish-subscribe Event Streaming System for Big Data
Producers publish billions of
messages/sec to a topic in a stream.
Guaranteed, immediate delivery
to all consumers.
Standard real-time API (Kafka).
Integrates with Spark Streaming,
Storm, Apex, and Flink
Direct data access (OJAI API) from
analytics frameworks.
To
pi
c
Stream
Producers
Remote sites and consumers
Batch analytics
Topic
Replication
Consumers
Consumers
© 2016 MapR Technologies 23
MapR Streams: Building Faster and Simpler Apps
Simpler and
Faster
Architecture
• Converged platform with file storage and
database reduces data movement, data latency,
hardware cost, and administration cost
• Event streaming and stream processing in the
same cluster enables faster processing
• Unified security framework with files and database
tables reduces administration cost around setting
up and enforcing security policies
• Multi-tenant - topic isolation, quotas, data
placement control allows multiple isolated
streaming applications to run on the same cluster
reducing hardware cost and data movement
© 2016 MapR Technologies 24
Global
• Global data and metadata replication enables
easier and reliable disaster recovery
• Active/active replication allows for cross-
datacenter producer & consumer failover to ensure
business continuity
• One unified view of all data created and distributed
across the globe
MapR Streams: Building Faster and Simpler Apps
© 2016 MapR Technologies 25
Scalable.
• Ingest more events to enable faster insights
• Hold on to events longer to enable deeper insights
• Develop app once and apply to short & long-term
data (i.e. run analysis on 15-days data AND 1-year
data using same application)
MapR Streams: Building Faster and Simpler Apps
© 2016 MapR Technologies 26 © 2016 MapR Technologies
MapR Ecosystem Pack
© 2016 MapR Technologies 27
Industry-leading decoupling model of platform from open source projects
With MapR Ecosystem Pack (MEP), customers get:
• Continued quick updates of fast-changing projects
• Continued decoupling of projects from platform to allow updates based
on customer’s timeframes
• Monthly access to bug fixes
• Quarterly MEP version updates with complete interoperability across all
projects
• Improved version upgrade experience for all platform and project
updates
MapR Ecosystem Pack: Accelerate Project Updates
© 2016 MapR Technologies 28 © 2016 MapR Technologies
Ecosystem Updates
© 2016 MapR Technologies 29
5.2 Ecosystem Support These are the only component version changes in MEP 1.0 from 5.2 release date
and all of these have been out for 5.1 already.
Eco on 5.1 today MEP 1.0 on 5.2
Component Released with 5.1 Subsequently released for
5.1
Drill 1.4 1.6 1.6
Spark 1.5.2 1.6.1 1.6.1 (2.0 in dev
preview)
Impala 2.2.0 2.5 2.5
Storm 0.10.0 0.10.1 0.10.1
Mahout 0.11.2 0.12.2 0.12.2
© 2016 MapR Technologies 30
Converging SQL and JSON with Apache Drill 1.6
• Flexible and operational analytics on NoSQL
– MapR-DB plugin allows analysts to perform SQL queries directly on JSON data in MapR-DB tables
– Pushdown capabilities provide optimal interactive experience
• Enhanced query performance
– Provides better query performance via partition pruning, metadata caching and other optimizations
– Delivers up to 10-60X performance gains in query planning compared to the previous releases of Drill
• Better memory management
– Delivers greater stability and scale which enables customers to run not only larger but also more SQL
workloads on a MapR cluster
• Improved integration with visualization tools like Tableau
– Introduces client impersonation for end-to-end security from the visualization tool to data in Hadoop.
– Enhanced SQL Window functions
© 2016 MapR Technologies 31
What’s New in Spark 2.0? • Structured Streaming with Spark SQL
– The ability to perform interactive queries against live streaming data.
– Output can now be aggregated in a stream for continuous applications.
– Pre-computation of analytics in a continuous fashion can occur as the data is generated
• Whole Stage Code-gen
– Provided by the second-generation Tungsten engine.
– Eliminates the need for multiple JVM calls by flattening SQL queries into one single
function evaluated as bytecode at runtime.
• Dataframe API’s
– Runs on the same engine as SparkSQL.
– Allows access to data from a variety of different data sources.
– Can run database-like operations or allow for passing in custom code.
© 2016 MapR Technologies 32 © 2016 MapR Technologies
End-of-Maintenance for 4.x and Continuing Quality Improvements
© 2016 MapR Technologies 33
End-of-Maintenance for Prior Releases
• 3.x end-of-maintenance this
past February
• 4.x end-of-maintenance
coming up in January 2017
http://maprdocs.mapr.com/home/#InteropMatrix/r_release_dates.html
© 2016 MapR Technologies 34
Continuing Quality Improvements
Plus several hundred community bug fixes across all ecosystem components along with Hadoop 2.7 Critical and Blocker fixes
OS upgrades for RHEL, CentOS, Ubuntu and SUSE
Java 1.8 support
Plus strategic partner certifications
Release Customer Reported Fixes
Cumulative
4.0.1 52
4.0.2 135 (83 new)
4.1 187 (52 new)
5.0 248 (61 new)
5.1 361 (113 new)
5.2 454 (93 new)
© 2016 MapR Technologies 35 © 2016 MapR Technologies
Step-up Program for 5.2
© 2016 MapR Technologies 36
Professional Services
• Installation
• Migrations
• SLA Plans
• Best Practices
• Performance
Tuning
Core Platform
Services
IT/ Infrastructure
Converged Platform
Linux
Networking
Data Center
Storage
Operations
Big Data
Workflows
• Hive/Pig/Spark
• Oozie/Sqoop
• Flume
• MapR-DB/HBase
• Data Pipeline
• MapR Streams
BI / DBA
BI / ETL / Reporting
Scripting / Java
Hadoop MR
Eco Projects
(HBase, Hive, …)
Solution
Design
• HBase/MapR-DB
• Map/Reduce
• Application
Development
• Integration
Development
Java
Hadoop Developer
Architectural Design
Advanced
Analytics
• Use case
Discovery
• Use case
Modeling
• POC
• Workshops
Modeler / Analyst
PhD
Statistics/Math
MatLab / R / SAS
Scripting / Java
BI / ETL / Reporting
Data Engineering Data Science
ENGAGEMENTS
SKILLS
© 2016 MapR Technologies 37
MapR 5.2 Upgrade Process Documentation
• MapR Documentation is available to help you upgrade:
maprdocs.mapr.com/home/UpgradeGuide/Upgrade-Guide.html
• The documentation walks you through the following steps:
– Planning the Upgrade: Determine the upgrade method
– Preparing to Upgrade: Prepare the running cluster for upgrade
– Upgrading the Cluster: With or Without the MapR installer
– Finishing the Upgrade: Complete the post-upgrade steps
– Upgrading MapR Clients: Perform steps to upgrade the MapR client
© 2016 MapR Technologies 38
MapR 5.2 Step-Up Program with MapR PS
• MapR Professional Services – Experience from 100s of engagements and
– Deep technical expertise in the Hadoop ecosystem
• MapR PS team will help you upgrade from 3.x or 4.x to 5.2 within a few weeks
• Service Includes – Environment Assessment – admin nodes, jobs, latencies etc.
– Cluster Health Check
– Suggest the best upgrade path- manual / installer etc.
– Upgrade the cluster to the latest version of the platform
– Upgrade the cluster to the latest eco-system packages
– Post-Upgrade Check
– Evaluate existing workflow and make recommendations on how to leverage YARN framework
• Provide a generic example of how YARN implementation is done
© 2016 MapR Technologies 39
Step-Up Program Details
# Nodes Upgrade
Package
PS
Engagement
< 25 nodes Core + Hive, Pig & Drill upgrade 1 week
25 - 75 nodes Core + Hive, Pig & Drill upgrade 2 weeks
75 - 200 nodes Core + Hive, Pig & Drill upgrade 3 weeks
> 200 nodes Custom Scoping Custom
Add-on Options
1 HBase Upgrade 1 additional week
2 Remaining Ecosystem Upgrade 1 additional week
3 Cluster preparation for YARN 1 additional week
4 App migration to YARN (MRv2) Custom
• Applicable for both 3.x and 4.x upgrades
• Up to 2 applications will be recompiled
• During the cluster health checks, reorganization of the cluster services (Zookeeper, CLDB, etc.) will be
evaluated based on best practices
© 2016 MapR Technologies 40
Q & A Engage with us!
• Upgrade documentation o maprdocs.mapr.com/home/UpgradeGuide/Upgrade-Guide.html
• Try MapR Streams and MapR-DB on-prem, cloud, or VM sandbox o mapr.com/download
• Get community support from experts • community.mapr.com