Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra

37
Real-Time log analysis with Mesos, Docker, Kafka, Spark, Cassandra and Solr at scale

Transcript of Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra

Page 1: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra

Real-Time log analysis with Mesos, Docker, Kafka, Spark, Cassandra and Solr at scale

Page 2: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra

whoami

CEO of Elodina http://www.elodina.net/ a big data as a service platform built on top open source software. The Elodina platform enables customers to analyze data streams and programmatically react to the results in real-time. We solve today’s data analytics needs by providing the tools and support necessary to utilize open source technologies. As users, contributors and committers, Elodina also provides support for frameworks that run on Mesos including Apache Kafka, Exhibitor (Zookeeper), Apache Storm, Apache Cassandra and a whole lot more!

Apache Kafka Committer & PMC Member

LinkedIn: http://linkedin.com/in/charmalloc Twitter : @allthingshadoop

2© 2015. All Rights Reserved.

Page 3: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra

1 Intro To Mesos, Kafka, Etc

2 Architecture Overview

3 Breaking it down into pieces

4 Questions?

3© 2015. All Rights Reserved.

Page 4: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra

Apache Mesos

4© 2015. All Rights Reserved.

Page 5: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra

Mesos Papers

Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center http://static.usenix.org/event/nsdi11/tech/full_papers/Hindman_new.pdf

Google Borg - https://research.google.com/pubs/pub43438.html

Google Omega: flexible, scalable schedulers for large compute clusters http://eurosys2013.tudos.org/wp-content/uploads/2013/paper/Schwarzkopf.pdf

5

Page 6: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra

Static Partitioning

6

Page 7: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra

Static Partitioning

7

Page 8: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra

Static Partitioning

8

Page 9: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra

Static Partitioning

9

Page 10: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra

Fine Grained Resource Elasticity

"If people knew how low it really is, we’d all get fired."https://gigaom.com/2013/11/30/the-sorry-state-of-server-utilization-and-the-impending-post-hypervisor-era/

10

Page 11: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra

An operating system for your data center

11

Page 12: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra

EVERYTHING ON MESOS

12

Page 13: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra

How it works

13

Page 14: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra

Marathon

14

https://github.com/mesosphere/marathon

Cluster-wide init and control system for

services in cgroups or docker based on

Apache Mesos

Page 15: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra

Docker on Marathon

{ "id": "basic-3", "cmd": "python3 -m http.server 8080", "cpus": 0.5, "mem": 32.0, "container": { "type": "DOCKER", "docker": { "image": "python:3", "network": "BRIDGE", "portMappings": [ { "containerPort": 8080, "hostPort": 0 } ] } }}

15

Page 16: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra

Apache Kafka

16

Page 17: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra

Kafka papers

Apache Kafka was first open sourced by LinkedIn in 2011Papers

● Building a Replicated Logging System with Apache Kafka http://www.vldb.org/pvldb/vol8/p1654-wang.pdf

● Kafka: A Distributed Messaging System for Log Processing http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf

● Building LinkedIn’s Real-time Activity Data Pipeline http://sites.computer.org/debull/A12june/pipeline.pdf

● The Log: What Every Software Engineer Should Know About Real-time Data's Unifying Abstraction http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying

http://kafka.apache.org/17

Page 18: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra

How Big Data Starts

18

Page 19: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra

More Big Data! More!

19

Page 20: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra

uhhhh

20

Page 21: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra

eeesh

21

Page 22: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra

Kafka de-couples data pipelines

22

Page 23: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra

Distributed Replicated Log

Read & WriteIn real timeAs much as you wantAs fast as your network

23

Page 24: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra

Reference Architecture

24

Page 25: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra

Producers

syslog → Kafka via docker https://hub.docker.com/r/stealthly/syslog/

syslog → Kafka scheduler https://github.com/stealthly/syslog-service

statsd → Kafka scheduler https://github.com/stealthly/statsd-mesos-kafka

system stats collection → Kafka scheduler https://github.com/stealthly/syscol

tailf → Kafka https://github.com/stealthly/go_kafka_client/tree/master/producers/tailf

Any language https://cwiki.apache.org/confluence/display/KAFKA/Clients

25

Page 26: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra

Reference Architecture

26

Page 27: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra

Kafka on Mesos

https://github.com/mesos/kafka

27

Page 28: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra

Kafka on Mesos

• smart broker.id assignment.• preservation of broker placement (through constraints and/or

new features).• ability to-do configuration changes.• rolling restarts (for things like configuration changes).• scaling the cluster up and down with automatic, programmatic

and manual options.• smart partition assignment via constraints visa vi roles, resources

and attributes.

28

Page 29: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra

CLI & REST API

• scheduler - starts the scheduler.• broker

– add - adds one more more brokers to the cluster.– update - changes resources, constraints or broker properties one or more brokers.– remove - take a broker out of the cluster.– start - starts a broker up.– stop - this can either a graceful shutdown or will force kill it (./kafka-mesos.sh help stop)

• topic – list - list topics in cluster– add - add new topics in cluster– update - change topics in cluster– rebalance - allows you to rebalance a cluster either by selecting the brokers or topics to

rebalance. Manual assignment is still possible using the Apache Kafka project tools. Rebalance can also change the replication factor on a topic.

• help - ./kafka-mesos.sh help || ./kafka-mesos.sh help {command}

29

Page 30: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra

Reference Architecture

30

Page 31: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra

Schema Avro or ProtoBuff

• https://github.com/stealthly/go_kafka_client/blob/master/syslog/syslog_proto/logline.proto • https://github.com/stealthly/go_kafka_client/blob/master/logline.avsc

logline• line• logtypeid• source• tags (k/v pairs)• timings (k/v pairs)

31

Page 32: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra

Consume from Kafka → Write to Cassandra

Implement CQL write here https://github.com/stealthly/go_kafka_client/blob/master/consumers/consumers.go#L186-L194 with https://github.com/gocql/gocql

Go Kafka Client does fan out work processing, rebalance doesn’t upset consumers that are reading already.

32

Page 33: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra

Reference Architecture

33

Page 34: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra

Sample Spark Job → Cassandra

https://github.com/stealthly/gauntlet

Uses the Cassandra Spark Connector https://github.com/datastax/spark-cassandra-connector

34

Page 35: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra

Use DataStax Enterprise to enable Search

http://docs.datastax.com/en/datastax_enterprise/4.8/datastax_enterprise/srch/srchOverview.html

35

Page 36: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra

Questions?

36

http://www.elodina.net

Page 37: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra

Thank you