SMACK Stack 1.1

61
SMACK Stack 1.1

Transcript of SMACK Stack 1.1

Page 1: SMACK Stack 1.1

SMACK Stack 1.1

Page 2: SMACK Stack 1.1

Elodina is a big data as a service platform built on top of open source software.

The Elodina platform solves today’s data analytics needs by providing the tools and support necessary to utilize open source technologies.

http://www.elodina.net/

Page 3: SMACK Stack 1.1

Whats SMACK Stack?SMACK stack 1.0 has been traditionally Spark, Mesos, Akka, Cassandra and Kafka lots https://dzone.com/articles/smack-stack-guide and lots lots more https://www.google.com/webhp?q=smack%20stack

Now we are going to introduce SMACK Stack 1.1 and talk more about dynamic compute, micro services, orchestration, micro segmentation all part of what you can do now with Streaming, Mesos, Analytics, Cassandra and Kafka

Page 4: SMACK Stack 1.1

The free lunch is over!

http://www.gotw.ca/publications/concurrency-ddj.htm

Page 5: SMACK Stack 1.1

Many industries still don’t get itXML is everywhere but we have alternatives!

We can support XML interface but don’t have to take on the burden of the extra data. You can save A LOT of overheard just by having a pre-processing step taking the XML, turning it into Avro and processing and storing that.

It works https://github.com/elodina/xml-avro

You can even process the response in Avro but return the result in XML, more on that later though!

Page 6: SMACK Stack 1.1

You need to be running Mesos. Lots of options here!

What is most important is that you abstract your “Provider” from your “Grid”.

What is “The Grid”?

It is your PaaS layer you deploy too that runs your software. (aka your new awesome super computer)

The grid is your mesos cluster. You are likely going to have more than one so plan accordingly. Think of it as immutable infrastructure, the computer does.

Step 1

Page 7: SMACK Stack 1.1

“Provider” of compute resources

Page 8: SMACK Stack 1.1

The Grid … 2.0 ...

https://github.com/elodina/sawfly/blob/master/cloud-deploy-grid.md

Program against your datacenter like it’s a single pool of resources Apache Mesos abstracts CPU, memory, storage, and other compute resources away from machines (physical or virtual), enabling fault-tolerant and elastic distributed systems to easily be built and run effectively. Mesosphere’s Data Center Operating System (DCOS) is an operating system that spans all of the machines in a datacenter or cloud and treats them as a single computer, providing a highly elastic and highly scalable way of deploying applications, services, and big data infrastructure on shared resources. DCOS is based on Apache Mesos and includes a distributed systems kernel with enterprise-grade security.

Page 9: SMACK Stack 1.1
Page 10: SMACK Stack 1.1
Page 11: SMACK Stack 1.1

Data Center Optimization!

Page 12: SMACK Stack 1.1
Page 13: SMACK Stack 1.1

But there is more!● Provisioning● Micro Segmentation● Orchestration● Configuration Management● Service Discovery● Deployment Isolation and Identification● Telemetry, Tracing, Ops Stuff, Etc● Oh My!

It boils back down into stacks! https://github.com/elodina/stack-deploy and how you are working with your schedulers in your cluster ultimatlly.

Page 14: SMACK Stack 1.1

Stack Deploy to the rescue!

Page 15: SMACK Stack 1.1
Page 16: SMACK Stack 1.1

In the Grid you need Schedulers!● Kafka – Producer/Consumer-based message queue management● Exhibitor – Supervisor for distributed persistence (like ZooKeeper)● Cassandra/DSE – HA, scalable, distributed NoSQL data storage● Storm – Topology-based Real-time distributed data streaming ● Monarch – Distributed Remote Procedure Calls, Kafka REST interface and schema repository ● Zipkin – Configure, launch and manage Zipkin distributed trace on Mesos● HDFS – Configure, launch and manage HDFS on Mesos (coming soon)● Stockpile – Consumer to “stock pile” data into persistent storage (mesos scheduler only for c* now)● MirrorMaker – Consumer to make a mirror copy of data to destination● StatsD – Producer to pump StatsD on Mesos into Kafka for consumption, preserves layers● SysLog – Producer to pump Syslog on Mesos into Kafka for consumption, preserves layers

https://github.com/elodina/

Page 17: SMACK Stack 1.1
Page 18: SMACK Stack 1.1

Virtual Telemetry “Data Center” In the Grid

ZipkinQATeamBuild92● 1x Exhibitor-Mesos

● 1x Exhibitor

● 1x DSE-Mesos

● 1x Cassandra node

● 1x Kafka-Mesos

● 1x Kafka 0.8 broker

● 1x Zipkin-Mesos

● 1x Zipkin Collector

● 1x Zipkin Query

● 1x Zipkin Web

“cluster”

“zone”

“Stack” - defaultSimpleZipkinFull

“data center”

Page 19: SMACK Stack 1.1

Stack Deploy In Action

./stack-deploy addlayer --file stacks/cassandra_dc.stack --level datacenter

./stack-deploy addlayer --file stacks/cassandra_cluster.stack --level cluster --parent cassandra_dc

./stack-deploy addlayer --file stacks/cassandra_zone1.stack --level zone --parent cassandra_cluster

./stack-deploy addlayer --file stacks/cassandra_zone2.stack --level zone --parent cassandra_cluster

./stack-deploy add --file stacks/cassandra.stack

./stack-deploy run cassandra --zone cassandra_zone1

Page 20: SMACK Stack 1.1
Page 21: SMACK Stack 1.1
Page 22: SMACK Stack 1.1
Page 23: SMACK Stack 1.1
Page 24: SMACK Stack 1.1

Full Stack Deployments

Page 25: SMACK Stack 1.1
Page 26: SMACK Stack 1.1

Cassandra

Page 27: SMACK Stack 1.1

Cassandra Multi DC

Page 28: SMACK Stack 1.1
Page 29: SMACK Stack 1.1
Page 30: SMACK Stack 1.1

Casandra https://github.com/elodina/datastax-enterprise-mesos

Page 31: SMACK Stack 1.1
Page 32: SMACK Stack 1.1

Start your nodes!

Page 33: SMACK Stack 1.1
Page 34: SMACK Stack 1.1

Apache Kafka• Apache Kafkao http://kafka.apache.org

• Apache Kafka Source Codeo https://github.com/apache/kafka

• Documentationo http://kafka.apache.org/documentation.html

• Wikio https://cwiki.apache.org/confluence/display/KAFKA/Index

Page 35: SMACK Stack 1.1

It often starts with just one data pipeline

Page 36: SMACK Stack 1.1

Reuse of data pipelines for new producers

Page 37: SMACK Stack 1.1

Reuse of existing providers for new consumers

Page 38: SMACK Stack 1.1

Eventually the solution becomes the problem

Page 39: SMACK Stack 1.1

Kafka decouples data-pipelines

Page 40: SMACK Stack 1.1
Page 41: SMACK Stack 1.1

Topics & Partitions

Page 42: SMACK Stack 1.1

A high-throughput distributed messaging system rethought as a distributed commit log.

Page 43: SMACK Stack 1.1

Intra Cluster Replication

Page 44: SMACK Stack 1.1

Mesos Kafka http://github.com/mesos/kafka

Page 45: SMACK Stack 1.1
Page 46: SMACK Stack 1.1
Page 47: SMACK Stack 1.1
Page 48: SMACK Stack 1.1
Page 49: SMACK Stack 1.1

Streaming & Analytics● The landscape of streaming is about to get more fragmented and harder to

navigate. This is not all bad news and it is not much different than where we were with NoSQL 6 years ago or so.

● Different systems are getting really (really (really)) good at different things.○ Dag based systems○ Event based systems○ Query & Execution Engines○ Streaming Engines○ Etc!

Page 50: SMACK Stack 1.1

GearPump

Page 51: SMACK Stack 1.1
Page 52: SMACK Stack 1.1

Airflow

Page 53: SMACK Stack 1.1

Spring Cloud Data Flow

Page 54: SMACK Stack 1.1

Storm (and Storm Topology based systems)

Page 55: SMACK Stack 1.1

Storm Nimbus{

"id": "storm-nimbus",

"cmd": "cp storm.yaml storm-mesos-0.9.6/conf && cd storm-mesos-0.9.6 && ./bin/storm-mesos nimbus -c mesos.master.url=zk:

//zookeeper.service:2181/mesos -c storm.zookeeper.servers=\"[\\\"zookeeper.service\\\"]\" -c nimbus.thrift.port=$PORT0 -c topology.

mesos.worker.cpu=0.5 -c topology.mesos.worker.mem.mb=615 -c worker.childopts=-Xmx512m -c topology.mesos.executor.cpu=0.1 -c

topology.mesos.executor.mem.mb=160 -c supervisor.childopts=-Xmx128m -c mesos.executor.uri=http://repo.elodina.s3.amazonaws.

com/storm-mesos-0.9.6.tgz -c storm.log.dir=$(pwd)/logs",

"cpus": 1.0,

"mem": 1024,

"ports": [31056],

"requirePorts": true,

"instances": 1,

"uris": [

"http://repo.elodina.s3.amazonaws.com/storm-mesos-0.9.6.tgz",

"http://repo.elodina.s3.amazonaws.com/storm.yaml"

]

}

Page 56: SMACK Stack 1.1

Storm UI{

"id": "storm-ui",

"cmd": "cp storm.yaml storm-mesos-0.9.6/conf && cd storm-mesos-0.9.6 && ./bin/storm ui -c ui.port=$PORT0 -c nimbus.thrift.port=31056 -c nimbus.

host=storm-nimbus.service -c storm.log.dir=$(pwd)/logs",

"cpus": 0.2,

"mem": 512,

"ports": [31067],

"requirePorts": true,

"instances": 1,

"uris": [

"http://repo.elodina.s3.amazonaws.com/storm-mesos-0.9.6.tgz",

"http://repo.elodina.s3.amazonaws.com/storm.yaml"

],

"healthChecks": [

{

"protocol": "HTTP",

"portIndex": 0,

"path": "/",

"gracePeriodSeconds": 120,

"intervalSeconds": 20,

"maxConsecutiveFailures": 3

}

]

}

Page 57: SMACK Stack 1.1

Storm Kafka - new spouts & bolts for Kafka 8, 9, ...

Page 58: SMACK Stack 1.1

Apache Kafka Streams

Page 59: SMACK Stack 1.1
Page 60: SMACK Stack 1.1

Go Kafka Client - Fan Out Processinghttps://github.com/elodina/go-kafka-client-mesos

● Dynamic Kafka Log workers● Blue/Green Deploy Support● Fan Out Processing● Auditable ● Batches● Scalable/Auto-Scalable

Page 61: SMACK Stack 1.1

Questions?

http://www.elodina.net