HADOOP Monitoring and Diagnostics: Challenges and Lessons Learned

HADOOP MONITORING AND DIAGNOSTICS:CHALLENGES AND LESSONS LEARNEDMatthew [email protected]

About this Talk

• Building monitoring and diagnostic tools for Hadoop

• How we think about Hadoop monitoring and diagnostics

• Interesting problems we have• A few things we've learned in the process

What is Hadoop?

• Platform for distributed processing and storage of petabytes of data on clusters of commodity hardware

• Operating system for the cluster• Services that interact and are composable

• HDFS, MapReduce, HBase, Pig, Hive, ZK, etc...• Open source

• Different Apache projects, different communities

Managing the Complexity

• Hadoop distributions, e.g. Cloudera's CDH• Packaged services, well tested

• Existing tools• Ganglia, Nagios, Chef, Puppet, etc.

• Management tools for Hadoop• Cloudera Manager

• Deployment, configuration, reporting,monitoring, diagnosis

• Used by operators @ Fortune 50 companies

Thinking about Hadoop

Hadoop: services with many hostsrather than: hosts with many services

• Tools should be service-oriented• Most general existing management tools

are host-oriented

Monitoring

• Provide insight into the operation of the system

• Challenges:• Knowing what to collect• Collecting, storing efficiently at scale• Deciding how to present data

Hadoop Monitoring Data (1)

• Operators care about• Resource and scheduling information• Performance and health metrics• Important log events

• Come from• Metrics exposed via JMX (metrics/metrics2)• Logs (Hadoop services, OS)• Operating system (/proc, syscalls, etc.)

Hadoop Monitoring Data (2)

• Choosing what to collect• Not all! Some are just confusing

• e.g. DN corrupt replicas vs. blocks with corrupt replicas

• We’re filtering for users• Add more when we see customer problems• But…

• Interfaces change between versions• Just messy

Example Metric Data, HDFS

• I/O metrics, read/write bytes, counts• Blocks, replicas, corruptions• FS info, volume failures, usage/capacity• NameNode info, time since checkpoint,

transactions since checkpoint, num DNs failed

• Many more...

Hadoop Monitoring, What to show

• Building an intuitive user interface is hard• Especially for a complex system like Hadoop

• Need service-oriented view• Pre-baked visualizations (charts,

heatmaps, etc.)• Generic data visualization capabilities

• Experts know exactly what they want to see• e.g. chart number of corrupt DN block replicas

by rack

Diagnostics

• Inform operators when something is wrong• E.g. datanode has too many corrupt blocks

• Hard problem• No single solution• Need multiple tools for diagnosis

• Really don't want to be wrong• Operators lose faith in the tool

Health Checks

• Set of rule-based checks for specific problems• Simple, stateless, based on metric data• Well targeted, catch real problems• Easy to get 'right'

• Learn from real customer problems• Add checks when customers hit hard-to-

diagnose problems• E.g. customer saw slow HBase reads

• Hard to find! bad switch → packet frame errors

Health Checks, Examples

• HDFS missing blocks, corrupt replicas• DataNode connectivity, volume failures• NameNode checkpoint age, safe mode• GC duration, number file descriptors, etc...• Canary-based checks

• e.g. can write a file to HDFS,can perform basic HBase operations

• Many more...

Health Checks (2)

• Not good for performance, context-aware issues

• Have to build manually, time consuming• Can take these further

• Add more knowledge about root cause• Taking actions in some cases

Anomaly Detection

• Simple statistics, e.g. std deviation• More clever machine learning algorithms

• Local outliers in high-dimensional 'metric space'

• Streaming algorithm seems feasible• Identify what's abnormal for a particular

cluster• Must use carefully – outlier != problem

• Measure of 'potential interestingness'

Other Diagnostic Tools and Challenges

• Anomaly detection via log data• Need data across services

• E.g. slow HBase reads caused by HDFS latency

• Better instrumentation in platform• E.g. Dapper-like tracing through the stack

HBASE-6449• Future work to extend to HDFS

https://issues.apache.org/jira/browse/HBASE-6449

Challenges: Hadoop Fault Tolerance

• Hadoop is built to tolerate failures• E.g. HDFS replication

• Not clear when to report a problem• E.g. 1 failed DN maybe not concerning

enough

Challenges in Diagnostics (2)

• Entities interact• E.g. health of HDFS depends on health of DNs, NNs,

etc…• Relations describe graph of computation to evaluate

health• Evaluating cluster/service/host health becomes

challenging• Data arrives from different sources at different times

• When to evaluate health? Every minute? When data changes?

• Complete failures >> partial failures

Challenges Operating at Scale (1)

• Building a distributed system to monitor a distributed system

• Collect metrics for lots of 'things' (entities)• DataNodes, NameNodes, TaskTrackers, JobTrackers,

RegionServers, Regions, etc.• Hosts, disks, NICs, data directories, etc.

• Aggregate many metrics too• e.g. aggregate DN metrics → HDFS-wide metrics

aggregate region metrics → table metrics• Cluster-wide, service-wide, rack-wide, etc.

• Becomes a big data problem

Challenges Operating at Scale (2)

• At 1000 nodes...• Hundreds of thousands of entities• Millions of metrics written per minute

• Increase polling? Every 30 sec? 10 sec?• Simple RDBMS is OK for a while...

• Shard, partition, etc.

Storage for Monitoring Data

• Hadoop (HBase) + OpenTSDB is great• But we don't eat our own tail...

• Can use other TS databases• Modify Hbase, make 'embedded' version

• Just a single node, just a single Region• Or use LevelDB

• Fast key-value store from Google, open source

http://opentsdb.net/

LevelDB (or HBase), an example

• Data model, simplified• Have time series for many entities: tsId

• e.g. DNs, Regions, hosts, disks, etc.• Have many metric streams: metricId

• e.g. DN bytes read, JVM gc count, etc.

• LevelDB, fast key-value store• Key: byte array of

“<tsId><metricId><timestamp>”• Value: data

LevelDB example (2)

• Can write many data points per row• Timestamp in key is timestamp base• Write each data point time delta before value• E.g. value: “<delta1><val1><delta2><val2>...”

or “<delta1><delta2>...<val1><val2>...”• Will compress well

• Very similar to what OpenTSDB does

QUESTIONS?

HADOOP Monitoring and Diagnostics: Challenges and Lessons Learned

Documents

Transcript of HADOOP Monitoring and Diagnostics: Challenges and Lessons Learned