End to End Processing of 3.7 Million Telemetry Events per Second using Lambda Architecture

41
End to End Processing of 3.7 Million Telemetry Events per Second Using Lambda Architecture Saurabh Mishra Raghavendra Nandagopal

Transcript of End to End Processing of 3.7 Million Telemetry Events per Second using Lambda Architecture

Page 1: End to End Processing of 3.7 Million Telemetry Events per Second using Lambda Architecture

End to End Processing of 3.7 Million Telemetry Events per Second Using Lambda Architecture

Saurabh Mishra Raghavendra Nandagopal

Page 2: End to End Processing of 3.7 Million Telemetry Events per Second using Lambda Architecture

Who are We?

Saurabh MishraSolution Architect, Hortonworks Professional Services @draftsperson [email protected]

Raghavendra NandagopalCloud Data Services Architect, Symantec@[email protected]

Page 3: End to End Processing of 3.7 Million Telemetry Events per Second using Lambda Architecture

Cloud Platform EngineeringSymantec - Global Leader in Cyber Security

- Symantec is the world leader in providing security software for both enterprises and end users

- There are 1000’s of Enterprise Customers and more than 400 million devices (Pcs, Tablets and Phones) that rely on Symantec to help them secure their assets from attacks, including their data centers, emails and other sensitive data

Cloud Platform Engineering (CPE)- Build consolidated cloud infrastructure and platform services for next generation data powered

Symantec applications

- A big data platform for batch and stream analytics integrated with both private and public cloud.

- Open source components as building blocks

- Hadoop and Openstack

- Bridge feature gaps and contribute back

Page 4: End to End Processing of 3.7 Million Telemetry Events per Second using Lambda Architecture

Agenda

• Security Data Lake @ Global Scale• Infrastructure At Scale• Telemetry Data Processing Architecture• Tunable Targets• Performance Benchmarks• Service Monitoring

Page 5: End to End Processing of 3.7 Million Telemetry Events per Second using Lambda Architecture

Security Data Lake @ Global Scale

Page 6: End to End Processing of 3.7 Million Telemetry Events per Second using Lambda Architecture

Security Data Lake @ Global Scale

Products

HDFS

Analytic Applications, Workload Management (YARN)

Stream Processing (Storm)

Real-time Results(HBase, ElasticSearch)

Query(Hive, Spark SQL)

Device Agents

Telemetry Data

Data Transfer

Threat ProtectionInbound Messaging (Data Import, Kafka)

Physical Machine , Public Cloud, Private Cloud

Page 7: End to End Processing of 3.7 Million Telemetry Events per Second using Lambda Architecture

Lambda Architecture

7

Speed Layer

Batch Layer

Serving Layer

Complexity Isolation Once Data Makes to Serving Layer via Batch , Speed Layer can be Neglected

Compensate for high latency of updates to serving layer Fast and incremental algorithms on real time data Batch layer eventually overrides speed layer

Random access to batch viewsUpdated by batch layer

Stores master datasetBatch layer stores the master copy of the Serving layer Computes arbitrary Views

Page 8: End to End Processing of 3.7 Million Telemetry Events per Second using Lambda Architecture

Infrastructure At Scale

Page 9: End to End Processing of 3.7 Million Telemetry Events per Second using Lambda Architecture

Yarn Applications in Production- 669388 submitted- 646632 completed- 4640 killed- 401 failed

Hive in Production- 25887 Tables- 306 Databases- 98493 Partitions

Storm in Production- 210 Nodes 50+ Topologies

Kafka in Production- 80 Nodes

Hbase in Production- 135 Nodes

ElasticSearch in Production- 62 Nodes

Ambari

Infrastructure At Scale

Centralized Logging

and Metering

IronicAnsible

Cloudbreak

Hybrid Data Lake

OpenStack(Dev)

350 Nodes

Metal(Production)600 Nodes

Public Cloud

(Production)

200 Nodes

Page 10: End to End Processing of 3.7 Million Telemetry Events per Second using Lambda Architecture

Telemetry Data Processing Architecture

Page 11: End to End Processing of 3.7 Million Telemetry Events per Second using Lambda Architecture

Telemetry Data Processing Architecture

Telemetry Data Collector

Telemetry GatewayRaw Events

Data Centers

Avro SerializedTelemetry Avro Serialized

Telemetry

Opaque TridentKafka Spout

Deserialized Objects

Transformations Functions

Transformation Topology

Trident Streams (Micro batch implementation, Exactly Once semantics)

Persist Avro Objects

Avro Serialized Transformed Objects

ElasticSearch Ingestion TopologyOpaque

TridentKafka Spout

Trident ElasticSearch Writer Bolt

HBase Ingestion Topology

TridentHBase Bolt

Trident Hive

Streaming

Opaque TridentKafka Spout

YARN

Hive Ingestion Topology

Identity Topology

Page 12: End to End Processing of 3.7 Million Telemetry Events per Second using Lambda Architecture

Tunable Targets

Page 13: End to End Processing of 3.7 Million Telemetry Events per Second using Lambda Architecture

Operating SystemTuning Targets

● Operating System● Disk● Network

Tunables● Disable Transparent Huge Pages

echo never > defrag and > enabled● Disable Swap● Configure VM Cache flushing● Configure IO Scheduler as deadline● Disk Jbod Ext4

Mount Options- inode_readahead_blks=128,data=writeback,noatime,nodiratime

● Network Dual Bonded 10gbps

rx-checksumming: on, tx-checksumming: on, scatter-gather: on, tcp-segmentation-offload: on

Page 14: End to End Processing of 3.7 Million Telemetry Events per Second using Lambda Architecture

KafkaTuning Targets

●Broker●Producer●Consumer

Tunables● replica.fetch.max.bytes ● socket.send.buffer.bytes ● socket.receive.buffer.bytes● replica.socket.receive.buffer.bytes● num.network.threads ● num.io.threads ● zookeeper.*.timeout.ms

Type

Metal 2.6 GHzE5-2660 v3

12 * 4TBJBOD

128 GBDDR4 ECC

Cloud AWS: D2*8xlarge

v 0.8.2.1

Page 15: End to End Processing of 3.7 Million Telemetry Events per Second using Lambda Architecture

KafkaTuning Targets

● Broker● Producer● Consumer

Tunables● buffer.memory● batch.size● linger.ms● compression.type● socket.send.buffer.bytes

Type

Metal 2.6 GHzE5-2660 v3

12 * 4TBJBOD

128 GBDDR4 ECC

Cloud AWS: D2*8xlarge

v 0.8.2.1

Page 16: End to End Processing of 3.7 Million Telemetry Events per Second using Lambda Architecture

KafkaTuning Targets

●Broker●Producer●Consumer

Tunables

● num.consumer.fetchers● socket.receive.buffer.bytes● fetch.message.max.bytes● fetch.min.bytes

Type

Metal 2.6 GHzE5-2660 v3

12 * 4TBJBOD

128 GBDDR4 ECC

Cloud AWS: D2*8xlarge

v 0.8.2.1

Page 17: End to End Processing of 3.7 Million Telemetry Events per Second using Lambda Architecture

StormTuning Targets

● Nimbus● Supervisors● Workers and Executors● Topology

Tunables● Nimbus High Availability - 4 Nimbus Servers Avoid downtime and performance degradation.● Reduce workload on Zookeeper and Decreased

topology Submission time. storm.codedistributor.class = HDFSCodeDistributor

● topology.min.replication.count = 3 floor(number_of_nimbus_hosts/2 + 1)

● max.replication.wait.time.sec = -1● code.sync.freq.secs = 2 mins● storm.messaging.netty.buffer_size = 10 mb● nimbus.thrift.threads = 256

Type

Metal 2.6 GHzE5-2660 v3

2 * 500GBSSD

256 GBDDR4 ECC

Cloud AWS: r3*8xlarge

v 0.10.0.2.4

Page 18: End to End Processing of 3.7 Million Telemetry Events per Second using Lambda Architecture

StormTuning Targets

● Nimbus● Supervisors● Workers and Executors● Topology

Tunables● Use supervisord to control Supervisors● Supervisor.slots.ports = Min (No of HT Cores ,

TotalMem of Server/Worker heap size)● Supervisor.childopts = -Xms4096m -Xmx4096m -

verbose:gc-Xloggc:/var/log/storm/supervisor_%ID%_gc.log

Type

Metal 2.6 GHzE5-2660 v3

2 * 500GBSSD

256 GBDDR4 ECC

Cloud AWS: r3*8xlarge

v 0.10.0.2.4

Page 19: End to End Processing of 3.7 Million Telemetry Events per Second using Lambda Architecture

StormTuning Targets

● Nimbus● Supervisors● Workers and Executors● Topology

TunablesRule of Thumb! - Use Case of Storm – Telemetry Processing

● CPU bound tasks 1 Executor Per worker ● IO Bound tasks 8 Executors Per worker, ● Fixed the JVM Memory for each Worker Based on

Fetch Size of Kafka Trident Spout and Split Size of Bolt -Xms8g -Xmx8g -XX:MaxDirectMemorySize=2048m -XX:NewSize=2g -XX:MaxNewSize=2g -XX:+UseParNewGC -XX:MaxTenuringThreshold=2 -XX:SurvivorRatio=8 -XX:+UnlockDiagnosticVMOptions -XX:ParGCCardsPerStrideChunk=32768 -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:+CMSClassUnloadingEnabled -XX:CMSInitiatingOccupancyFraction=80 -XX:+UseCMSInitiatingOccupancyOnly -XX:-CMSConcurrentMTEnabled -XX:+AlwaysPreTouch

Type

Metal 2.6 GHzE5-2660 v3

2 * 500GBSSD

256 GBDDR4 ECC

Cloud AWS: r3*8xlarge

v 0.10.0.2.4

Page 20: End to End Processing of 3.7 Million Telemetry Events per Second using Lambda Architecture

StormTuning Targets

● Nimbus● Supervisors● Workers and Executors● Topology

Tunables● Topology.optimize = true● Topology.message.timeout.secs = 110 ● Topology.max.spout.pending = 3● Remove Topology.metrics.consumer.register- AMBARI-13237

Incoming queue and Outgoing queue● Topology.transfer.buffer.size = 64 – Batch Size● Topology.receiver.buffer.size = 16 – Queue Size● Topology.executor.receive.buffer.size = 32768 ● Topology.executor.send.buffer.size = 32768

Type

Metal 2.6 GHzE5-2660 v3

2 * 500GBSSD

256 GBDDR4 ECC

Cloud AWS: r3*8xlarge

v 0.10.0.2.4

Page 21: End to End Processing of 3.7 Million Telemetry Events per Second using Lambda Architecture

StormTuning Targets

● Nimbus● Supervisors● Workers and Executors● Topology

Tunables● topology.trident.parallelism.hint = (number of

worker nodes in cluster * number cores per worker node) - (number of acker tasks)

● Kafka.consumer.fetch.size.byte = 209715200 ( 200MB - Yes! We process large batches)

● Kafka.consumer.buffer.size.byte = 209715200● Kafka.consumer.min.fetch.byte = 100428800

Type

Metal 2.6 GHzE5-2660 v3

2 * 500GBSSD

256 GBDDR4 ECC

Cloud AWS: r3*8xlarge

v 0.10.0.2.4

Page 22: End to End Processing of 3.7 Million Telemetry Events per Second using Lambda Architecture

ZooKeeper

Tunables● Keep data and log directories separately and on different

mounts

● Separate Zookeeper quorum of 5 Servers each for Kafka, Storm, Hbase, HA quorum.

● Zookeeper GC Configurations -Xms4192m -Xmx4192m -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -Xloggc:gc.log -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGC -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -verbose:gc -Xloggc:/var/log/zookeeper/zookeeper_gc.log

Type

Metal 2.6 GHzE5-2660 v3

2*400 GB SSD

128 GBDDR4 ECC

Cloud AWS: r3.2xlarge

v 3.4.6

Tuning Targets● Data and Log directory● Garbage Collection

Page 23: End to End Processing of 3.7 Million Telemetry Events per Second using Lambda Architecture

ElasticsearchType

Metal 2.6 GHzE5-2660 v3

14 * 400 GBSSD

256 GBDDR4 ECC

Cloud AWS: i2.4xlarge

Tunables● bootstrap.mlockall: true● indices.fielddata.cache.size: 25%● threadpool.bulk.queue_size: 5000● index.refresh_interval: 30s● Index.memory.index_buffer_size: 10%● index.store.type: mmapfs● GC settings: -verbose:gc -Xloggc:/var/log/elasticsearch/elasticsearch_gc.log -

Xss256k -Djava.awt.headless=true -XX:+UseCompressedOops -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+DisableExplicitGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:ErrorFile=/var/log/elasticsearch_err.log -XX:ParallelGCThreads=8

● Bulk api● Client node

v 1.7.5

Tuning Targets● Index Parameters● Garbage Collection

Page 24: End to End Processing of 3.7 Million Telemetry Events per Second using Lambda Architecture

HbaseTuning Targets

● Region server GC● Hbase Configurations

Tunablesexport HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -XX:+UseConcMarkSweepGC -Xmn2500m -XX:SurvivorRatio=4 -XX:CMSInitiatingOccupancyFraction=50 -XX:+UseCMSInitiatingOccupancyOnly -Xmx{{regionserver_heapsize}} -Xms{{regionserver_heapsize}} -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintPromotionFailure -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:${HBASE_LOG_DIR}/hbase-gc-regionserver.log.`date +'%Y%m%d%H%M'`

Type

Metal 2.6 GHzE5-2660 v3

14 * 400 GBSSD

256 GBDDR4 ECC

Cloud AWS: i2.8xlarge

v 1.1.0

Page 25: End to End Processing of 3.7 Million Telemetry Events per Second using Lambda Architecture

HiveTuning Targets

● Table Structure● Partition and Bucketing

Scheme● Orc Tuning

Tunables

● Use strings instead of binaries.

● Use Integer fields.

Type

Metal 2.6 GHzE5-2660 v3

14 * 6TBHDD

256 GBDDR4 ECC

Cloud AWS: d2*8xlarge

v 1.2.1

Page 26: End to End Processing of 3.7 Million Telemetry Events per Second using Lambda Architecture

HiveTuning Targets

● Table Structure● Partition and Bucketing

Scheme● Orc Tuning

Tunables● Partitioning by Date Timestamp.

● Additional partitioning - Resulted in explosion of number of partitions, small file size and inefficient ORC compression.

● Bucketing: If two tables bucket on the same column, they should use the same number of buckets to support joining

● Sorting : Each table should optimize its sorting. The bucket column typically should be the first sorted column.

Type

Metal 2.6 GHzE5-2660 v3

14 * 6TBHDD

256 GBDDR4 ECC

Cloud AWS: d2*8xlarge

v 1.2.1

Page 27: End to End Processing of 3.7 Million Telemetry Events per Second using Lambda Architecture

HiveTuning Targets

● Table Structure● Partition and Bucketing

Scheme● Orc Tuning

Tunables

● Table Structure , Bucketing and Partition and Sorting

Impact ORC Performance.

● ORC Stripe Size default 128MB Balanced Insert and

Query Optimized.

● ORC use ZLIB Compression. Smaller data size

improves any query.

● Predicate Push Down.

Type

Metal 2.6 GHzE5-2660 v3

14 * 6TBHDD

256 GBDDR4 ECC

Cloud AWS: d2*8xlarge

v 1.2.1

No of Yarn Containers Per Query

orc.compress ZLIB high level compression (one of NONE, ZLIB, SNAPPY)

orc.compress.size 262144 number of bytes in each compression chunk

orc.stripe.size 130023424 number of bytes in each stripe

orc.row.index.stride 64,000 number of rows between index entries (must be >= 1000)

orc.create.index true whether to create row indexes

orc.bloom.filter.columns "file_sha2" comma separated list of column names for which bloom filter should be created

orc.bloom.filter.fpp 0.05 false positive probability for bloom filter (must >0.0 and <1.0)

Page 28: End to End Processing of 3.7 Million Telemetry Events per Second using Lambda Architecture

Hive StreamingTuning Targets

● Hive Metastore Stability● Evaluate BatchSize & TxnsPerBatch

Tunables● No Hive Shell Access only Hiveserver2.

● Multiple Hive Metastore Process

○ Compaction Metastore - 5 - 10 Compaction thread

○ Streaming Metastore - 5 - Connection pool

● 16 GB Heap Size.

● Metastore Mysql Database Scalability.

● Maximum EPS was achieved by Increasing Batch Size and keeping TxnPerBatch as Smaller.

Type

Metal 2.6 GHzE5-2660 v3

14 * 6TBHDD

256 GBDDR4 ECC

Cloud AWS: d2*8xlarge

v 1.2.1

Page 29: End to End Processing of 3.7 Million Telemetry Events per Second using Lambda Architecture

Performance Benchmarks

Page 30: End to End Processing of 3.7 Million Telemetry Events per Second using Lambda Architecture

Benchmarking Suite

Kafka Producer Consumer Throughput Test

Storm Core and Trident Topologies

Standard Platform Test Suite

Hive TPC

Page 31: End to End Processing of 3.7 Million Telemetry Events per Second using Lambda Architecture

Kafka Producer and Consumer Tests

The benchmark set contains Producer and Consumer test executing at various message size.

Producer and Consumer Together● 100 bytes● 1000 bytes - Average Telemetry Event Size ● 10000 bytes● 100000 bytes● 500000 bytes● 1000000 bytes

Type of Tests● Single thread no replication● Single-thread, async 3x replication● Single-thread, sync 3x replication● Throughput Versus Stored Data

Ingesting 10 Telemetry Sources in Parallel

Page 32: End to End Processing of 3.7 Million Telemetry Events per Second using Lambda Architecture

Storm Topology

The benchmark set custom topologies for processing telemetry data source transformation and ingestion

which simulates end to end use cases for real time streaming of Telemetry.

● Storm Trident HDFS Telemetry Transformation and Ingestion

● Storm Trident Hive Telemetry Ingestion

● Storm Trident Hbase Telemetry Ingestion

● Storm Trident Elasticsearch Telemetry Ingestion

Ingesting 10 Telemetry Sources in Parallel

Page 33: End to End Processing of 3.7 Million Telemetry Events per Second using Lambda Architecture

Standard Platform TestsTeraSort benchmark suite ( 2TB , 5TB, 10TB)RandomWriter(Write and Sort) - 10GB of random data per nodeDFS-IO Write and Read - TestDFSIONNBench (Write, Read, Rename and Delete)MRBench Data Load (Upload and Download)

Page 34: End to End Processing of 3.7 Million Telemetry Events per Second using Lambda Architecture

TPC-DS 20TBTPC-DS: Decision Support Performance Benchmarks

● Classic EDW Dimensional model

● Large fact tables

● Complex queries

Scale: 20TB

TPC-DS Benchmarking

Page 35: End to End Processing of 3.7 Million Telemetry Events per Second using Lambda Architecture

Service Monitoring

Page 36: End to End Processing of 3.7 Million Telemetry Events per Second using Lambda Architecture

Service Monitoring Architecture

Page 37: End to End Processing of 3.7 Million Telemetry Events per Second using Lambda Architecture

Kafka Monitoring

Page 38: End to End Processing of 3.7 Million Telemetry Events per Second using Lambda Architecture

ElasticSearch

Page 39: End to End Processing of 3.7 Million Telemetry Events per Second using Lambda Architecture

Storm Kafka Lag

Page 40: End to End Processing of 3.7 Million Telemetry Events per Second using Lambda Architecture

Storm Kafka Logging Collection

Page 41: End to End Processing of 3.7 Million Telemetry Events per Second using Lambda Architecture

Thank YouQ&A