Alluxio Mesos Meetup - SMACK to SMAACK

29
© 2016 Mesosphere, Inc. All Rights Reserved. From SMACK to SMAACK Alluxio meets DC/OS Jörg Schad, Mesosphere Adit Madan, Alluxio #smack @Alluxio @dcos @joerg_schad @madanadit

Transcript of Alluxio Mesos Meetup - SMACK to SMAACK

© 2016 Mesosphere, Inc. All Rights Reserved.

From SMACK to SMAACKAlluxio meets DC/OSJörg Schad, MesosphereAdit Madan, Alluxio

#smack @Alluxio @dcos @joerg_schad @madanadit

© 2017 Mesosphere, Inc. All Rights Reserved.

20% OFFMCDCOS20

September 13th - 15th ● Dedicated Tracks● MesosCon University ● Town Halls● Hackathon

Accelerating Spark workloads in a Mesos environment with Alluxio, 09/15, 11AM

© 2017 Mesosphere, Inc. All Rights Reserved. 3

Fast Data

Batch Event ProcessingMicro-Batch

Days Hours Minutes Seconds Microseconds

Solves problems using predictive and prescriptive analyticsReports what has happened using descriptive analytics

Predictive User InterfaceReal-time Pricing and Routing Real-time AdvertisingBilling, Chargeback Product recommendations

© 2017 Mesosphere, Inc. All Rights Reserved. 4

The SMACK Stack

EVENTSUbiquitous data streams from connected devices

INGEST

Apache Kafka

STORE

Apache Spark

ANALYZE

Apache Cassandra

ACT

Akka

Ingest millions of events per second

Distributed & highly scalable database

Real-time and batch process data

Visualize data and build data driven applications

Mesos/ DC/OS

Sensors

Devices

Clients

© 2017 Mesosphere, Inc. All Rights Reserved. 5

Datacenter

© 2017 Mesosphere, Inc. All Rights Reserved. 6

NAIVE APPROACH

Typical Datacentersiloed, over-provisioned servers,

low utilization

Industry Average12-15% utilization

mySQL

microservice

Cassandra

Spark/Hadoop

Kafka

© 2017 Mesosphere, Inc. All Rights Reserved. 7

© 2017 Mesosphere, Inc. All Rights Reserved. 8

MULTIPLEXING OF DATA, SERVICES, USERS, ENVIRONMENTS

Typical Datacentersiloed, over-provisioned servers,

low utilization

Mesos/ DC/OSautomated schedulers, workload multiplexing onto the

same machines

mySQL

microservice

Cassandra

Spark/Hadoop

Kafka

Datacenter Operating System (DC/OS)

Distributed Systems Kernel (Mesos)

DC/OS ENABLES MODERN DISTRIBUTED APPS

Big Data + Analytics EnginesMicroservices (in containers)

Streaming

Batch

Machine Learning

Analytics

Functions & Logic

Search

Time Series

SQL / NoSQL

Databases

Modern App Components

Any Infrastructure (Physical, Virtual, Cloud)9

© 2017 Mesosphere, Inc. All Rights Reserved. 10

The SMACK Stack

EVENTSUbiquitous data streams from connected devices

INGEST

Apache Kafka

STORE

Apache Spark

ANALYZE

Apache Cassandra

ACT

Akka

Ingest millions of events per second

Distributed & highly scalable database

Real-time and batch process data

Visualize data and build data driven applications

Mesos/ DC/OS

Sensors

Devices

Clients

© 2017 Mesosphere, Inc. All Rights Reserved. 11

The SMACK Stack

EVENTSUbiquitous data streams from connected devices

INGEST

Apache Kafka

STORE

Apache Spark

ANALYZE

Apache Cassandra

ACT

Akka

Ingest millions of events per second

Distributed & highly scalable database

Real-time and batch process data

Visualize data and build data driven applications

Mesos/ DC/OS

Sensors

Devices

Clients

© 2016 Mesosphere, Inc. All Rights Reserved.

BIG DATA ECOSYSTEM YESTERDAY

© 2017 Alluxio 12

© 2016 Mesosphere, Inc. All Rights Reserved.

BIG DATA ECOSYSTEM TODAY

© 2017 Alluxio

13

© 2016 Mesosphere, Inc. All Rights Reserved.

BIG DATA ECOSYSTEM ISSUES

© 2017 Alluxio

14

© 2017 Mesosphere, Inc. All Rights Reserved. 15

The SMAACK Stack

EVENTSUbiquitous data streams from connected devices

INGEST

Apache Kafka

STORE

Apache Spark

ANALYZE

Apache Cassandra

ACT

Akka

Ingest millions of events per second

Distributed & highly scalable database

Real-time and batch process data

Visualize data and build data driven applications

Mesos/ DC/OS

Sensors

Devices

Clients

Alluxio

© 2017 Mesosphere, Inc. All Rights Reserved. 16© 2017 Alluxio

© 2016 Mesosphere, Inc. All Rights Reserved.

BIG DATA ECOSYSTEM WITH ALLUXIO

FUSE Compatible File System Interface

Hadoop Compatible File System Interface

Native Key-Value Interface

Native File System Interface

HDFS Interface Amazon S3 Interface Swift Interface GlusterFS Interface

© 2017 Alluxio 17

© 2016 Mesosphere, Inc. All Rights Reserved.

BIG DATA ECOSYSTEM WITH ALLUXIO

FUSE Compatible File System Interface

Hadoop Compatible File System Interface

Native Key-Value Interface

Native File System Interface

HDFS Interface Amazon S3 Interface Swift Interface GlusterFS Interface

Enabling Application to Access Data from any Storage System at Memory-speed

© 2017 Alluxio 18

© 2016 Mesosphere, Inc. All Rights Reserved.

WHY ALLUXIO

© 2017 Alluxio

Co-located compute and data with memory-speed access to data

Virtualized across different storage systems under a unified namespace

Scale-out architecture

File system API, software only

19

© 2016 Mesosphere, Inc. All Rights Reserved.

ALLUXIO BENEFITS

© 2017 Alluxio

UnificationNew workflows across any data in any storage system

Orders of magnitude improvement in run time

Choice in compute and storage – grow each independently, buy only what is needed

Performance Flexibility

20

© 2017 Mesosphere, Inc. All Rights Reserved. 21© 2017 Alluxio

© 2016 Mesosphere, Inc. All Rights Reserved. 22

WHY DATA SERVICES ON DC/OS?

On-demand provisioning1

2

3

Simplified operations

Elastic data infrastructure

● Single command install of services

● Runtime software upgrade● Runtime application settings update● Monitoring & metrics● Managed persistent storage volumes

● Data services and containerized apps share resources● Deploy instances with different versions on the same

infrastructure● Resize instances● Add more instances

© 2017 Alluxio

© 2016 Mesosphere, Inc. All Rights Reserved. 23

ALLUXIO ON MESOSPHERE DC/OSFast, On-demand Unified Data at Memory Speed for Analytics

Alluxio

Mesosphere DC/OS

Any InfrastructureBuild apps once in DC/OS, and run anywhere

Runs distributed apps anywhere as simply as running apps on your laptop

Unify Data at Memory Speed Unify Data at Memory Speed

© 2017 Alluxio

© 2016 Mesosphere, Inc. All Rights Reserved. 24

ALLUXIO ON MESOSPHERE DC/OSFast, On-demand Unified Data at Memory Speed for Analytics

© 2017 Alluxio

© 2016 Mesosphere, Inc. All Rights Reserved.

WHY ALLUXIO ON MESOSPHERE DC/OS?

● Without Mesosphere DC/OS, provisioning of infrastructure is tedious

○ Mesosphere DC/OS automates app & cluster provisioning, management & elastic scaling

● Alluxio brings

○ A unified view of data across disparate storage systems

○ High performance & predictable SLA for analytics workloads

● Benefits include:

○ Process data in your existing cluster faster with Spark and other analytics frameworks

○ Process data from hybrid cloud storage systems (HDFS, S3, On-prem Object Stores etc)

© 2017 Alluxio 25

© 2016 Mesosphere, Inc. All Rights Reserved. 26

BIG DATA STACK WITH ALLUXIO ON MESOSPHERE DC/OSFast, On-demand Unified Data at Memory Speed for Analytics

Mesos

Container Orchestration Management & Monitoring Tools Apps Universe

Security Advanced Operations Multitenancy Adv. Network & Storage

Unifying Data at Memory Speed

© 2017 Alluxio

© 2017 Mesosphere, Inc. All Rights Reserved. 27© 2017 Alluxio

DEMO

© 2016 Mesosphere, Inc. All Rights Reserved.

WHAT HAPPENED?

● Alluxio scheduler (developed using the DC/OS SDK) launched as a Marathon application

○ Marathon manages and restarts the scheduler in case of failures

○ Scheduler consists of YAML + scripting

● Alluxio scheduler launched master and worker processes

○ Scheduler manages the configured number of instances even w/ failures

● Configuration changes take effect on the fly

○ Scaled up the worker instances

© 2017 Alluxio 28

© 2016 Mesosphere, Inc. All Rights Reserved.

GET STARTED TODAY

Read:● Mesosphere Blog: http://ow.ly/ou0530ax9aM● Alluxio Blog: http://ow.ly/ILOZ30ax8YE

Try it out:● Install Alluxio from DC/OS Universe

Questions?

© 2017 Alluxio 29