HADOOP Monitoring and Diagnostics: Challenges and Lessons Learned
description
Transcript of HADOOP Monitoring and Diagnostics: Challenges and Lessons Learned
HADOOP MONITORING AND DIAGNOSTICS:CHALLENGES AND LESSONS LEARNEDMatthew [email protected]
About this Talk
• Building monitoring and diagnostic tools for Hadoop
• How we think about Hadoop monitoring and diagnostics
• Interesting problems we have• A few things we've learned in the process
What is Hadoop?
• Platform for distributed processing and storage of petabytes of data on clusters of commodity hardware
• Operating system for the cluster• Services that interact and are composable
• HDFS, MapReduce, HBase, Pig, Hive, ZK, etc...• Open source
• Different Apache projects, different communities
Managing the Complexity
• Hadoop distributions, e.g. Cloudera's CDH• Packaged services, well tested
• Existing tools• Ganglia, Nagios, Chef, Puppet, etc.
• Management tools for Hadoop• Cloudera Manager
• Deployment, configuration, reporting,monitoring, diagnosis
• Used by operators @ Fortune 50 companies
Thinking about Hadoop
Hadoop: services with many hostsrather than: hosts with many services
• Tools should be service-oriented• Most general existing management tools
are host-oriented
Monitoring
• Provide insight into the operation of the system
• Challenges:• Knowing what to collect• Collecting, storing efficiently at scale• Deciding how to present data
Hadoop Monitoring Data (1)
• Operators care about• Resource and scheduling information• Performance and health metrics• Important log events
• Come from• Metrics exposed via JMX (metrics/metrics2)• Logs (Hadoop services, OS)• Operating system (/proc, syscalls, etc.)
Hadoop Monitoring Data (2)
• Choosing what to collect• Not all! Some are just confusing
• e.g. DN corrupt replicas vs. blocks with corrupt replicas
• We’re filtering for users• Add more when we see customer problems• But…
• Interfaces change between versions• Just messy
Example Metric Data, HDFS
• I/O metrics, read/write bytes, counts• Blocks, replicas, corruptions• FS info, volume failures, usage/capacity• NameNode info, time since checkpoint,
transactions since checkpoint, num DNs failed
• Many more...
Hadoop Monitoring, What to show
• Building an intuitive user interface is hard• Especially for a complex system like Hadoop
• Need service-oriented view• Pre-baked visualizations (charts,
heatmaps, etc.)• Generic data visualization capabilities
• Experts know exactly what they want to see• e.g. chart number of corrupt DN block replicas
by rack
Diagnostics
• Inform operators when something is wrong• E.g. datanode has too many corrupt blocks
• Hard problem• No single solution• Need multiple tools for diagnosis
• Really don't want to be wrong• Operators lose faith in the tool
Health Checks
• Set of rule-based checks for specific problems• Simple, stateless, based on metric data• Well targeted, catch real problems• Easy to get 'right'
• Learn from real customer problems• Add checks when customers hit hard-to-
diagnose problems• E.g. customer saw slow HBase reads
• Hard to find! bad switch → packet frame errors
Health Checks, Examples
• HDFS missing blocks, corrupt replicas• DataNode connectivity, volume failures• NameNode checkpoint age, safe mode• GC duration, number file descriptors, etc...• Canary-based checks
• e.g. can write a file to HDFS,can perform basic HBase operations
• Many more...
Health Checks (2)
• Not good for performance, context-aware issues
• Have to build manually, time consuming• Can take these further
• Add more knowledge about root cause• Taking actions in some cases
Anomaly Detection
• Simple statistics, e.g. std deviation• More clever machine learning algorithms
• Local outliers in high-dimensional 'metric space'
• Streaming algorithm seems feasible• Identify what's abnormal for a particular
cluster• Must use carefully – outlier != problem
• Measure of 'potential interestingness'
Other Diagnostic Tools and Challenges
• Anomaly detection via log data• Need data across services
• E.g. slow HBase reads caused by HDFS latency
• Better instrumentation in platform• E.g. Dapper-like tracing through the stack
HBASE-6449• Future work to extend to HDFS
Challenges: Hadoop Fault Tolerance
• Hadoop is built to tolerate failures• E.g. HDFS replication
• Not clear when to report a problem• E.g. 1 failed DN maybe not concerning
enough
Challenges in Diagnostics (2)
• Entities interact• E.g. health of HDFS depends on health of DNs, NNs,
etc…• Relations describe graph of computation to evaluate
health• Evaluating cluster/service/host health becomes
challenging• Data arrives from different sources at different times
• When to evaluate health? Every minute? When data changes?
• Complete failures >> partial failures
Challenges Operating at Scale (1)
• Building a distributed system to monitor a distributed system
• Collect metrics for lots of 'things' (entities)• DataNodes, NameNodes, TaskTrackers, JobTrackers,
RegionServers, Regions, etc.• Hosts, disks, NICs, data directories, etc.
• Aggregate many metrics too• e.g. aggregate DN metrics → HDFS-wide metrics
aggregate region metrics → table metrics• Cluster-wide, service-wide, rack-wide, etc.
• Becomes a big data problem
Challenges Operating at Scale (2)
• At 1000 nodes...• Hundreds of thousands of entities• Millions of metrics written per minute
• Increase polling? Every 30 sec? 10 sec?• Simple RDBMS is OK for a while...
• Shard, partition, etc.
Storage for Monitoring Data
• Hadoop (HBase) + OpenTSDB is great• But we don't eat our own tail...
• Can use other TS databases• Modify Hbase, make 'embedded' version
• Just a single node, just a single Region• Or use LevelDB
• Fast key-value store from Google, open source
LevelDB (or HBase), an example
• Data model, simplified• Have time series for many entities: tsId
• e.g. DNs, Regions, hosts, disks, etc.• Have many metric streams: metricId
• e.g. DN bytes read, JVM gc count, etc.
• LevelDB, fast key-value store• Key: byte array of
“<tsId><metricId><timestamp>”• Value: data
LevelDB example (2)
• Can write many data points per row• Timestamp in key is timestamp base• Write each data point time delta before value• E.g. value: “<delta1><val1><delta2><val2>...”
or “<delta1><delta2>...<val1><val2>...”• Will compress well
• Very similar to what OpenTSDB does
QUESTIONS?