CHUG: Datacenter Computing with Apache Mesos

Click here to load reader

  • date post

  • Category


  • view

  • download


Embed Size (px)


Mesos overview, presented at the Chicago Hadoop Users Group meetup at Sears Holdings Corp, 2013-11-06 video: meetup:

Transcript of CHUG: Datacenter Computing with Apache Mesos

  • 1.Datacenter Computing with Apache Mesos CHUG @ Sears Holdings Corp, 2013-11-06 Paco Nathan, chief scientist @pacoid

2. The Modern Kernel: Top Linux Contributors 3. Beyond Hadoop Google has been doing datacenter computing for years, to address the complexities of large-scale data workows:leveraging the modern kernel: isolation in lieu of VMsmost (>80%) jobs are batch jobs, but the majority of resources (5580%) are allocated to service jobsmixed workloads, multi-tenancyrelatively high utilization ratesJVM? not so muchreality: scheduling batch is simple; scheduling services is hard/expensive 4. Beyond Hadoop Hadoop an open source solution for fault-tolerant parallel processing of batch jobs at scale, based on commodity hardware however, other priorities have emerged for the analytics lifecycle: apps require integration beyond Hadoop higher utilizationmultiple topologies, mixed workloads, multi-tenancy signicant disruptions in h/w cost/performance curves lower latency highly-available, long running services more than Just JVM e.g., Python growthkeep in mind the priority for multi-disciplinary efforts, to break down silos extending beyond a de facto priesthood of data engineering 5. Return of the Borg Return of the Borg: How Twitter Rebuilt Googles Secret Weapon Cade Metz The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines Luiz Andr Barroso, Urs Hlzle 2011 GAFS Omega John Wilkes, et al. 6. Return of the Borg Omega: exible, scalable schedulers for large compute clusters Malte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek, John Wilkes Schwarzkopf.pdf 7. Google describes the business case Taming Latency Variability Jeff Dean 8. Mesos denitions a common substrate for cluster computing heterogenous assets in your datacenter or cloud made available as a homogenous set of resources top-level Apache project scalability to 10,000s of nodes obviates the need for virtual machines isolation (pluggable) for CPU, RAM, I/O, FS, etc. fault-tolerant leader election based on ZooKeeper APIs in C++, Java, Python web UI for inspecting cluster state available for Linux, OpenSolaris, Mac OSX 9. Mesos architecture given the use of Mesos as a Datacenter OS kernelChronos provides complex scheduling capabilities, much like a distributed Unix cronMarathon provides highly-available long-running services, much like a distributed Unix init.dnext time you need to build a distributed app, consider using these as building blocksa major lesson learned from Spark:leveraging these kinds of building blocks, one can rebuild Hadoop 100x faster, in much less code 10. Mesos architecture servicesbatchWorkloadsApps ScaldingMPIImpalaHadoopSharkSparkMySQLKafkaJBossDjangoChronosStormRailsFrameworksPyth onMarathonC++JVMKerneldistributed le systemdistributed resources: CPU, RAM, I/O, FS, rack locality, etc.DFSCluster 11. Mesos datacenter OS stackHADOOP TELEMETRYSTORMCHRONOSRAILSCAPACITY PLANNING SMARTER SCHEDULINGMESOSSECURITYJBOSS GUIApps OS Kernel 12. Prior Practice: Dedicated ServersDATACENTER low utilization rates longer time to ramp up new services 13. Prior Practice: VirtualizationDATACENTERPROVISIONED VMS even more machines to manageVM licensing costssubstantial performance decrease due to virtualization 14. Prior Practice: Static PartitioningDATACENTERSTATIC PARTITIONING even more machines to manage VM licensing costssubstantial performance decrease due to virtualization static partitioning limits elasticity 15. Mesos: One Large Pool Of ResourcesDATACENTERMESOSWe wanted people to be able to program for the datacenter just like they program for their laptop." Ben Hindman 16. Arguments for Datacenter Computing rather than running several specialized clusters, each at relatively low utilization rates, instead run many mixed workloads obvious benets are realized in terms of: scalability, elasticity, fault tolerance, performance, utilization reduced equipment capex, Ops overhead, etc. reduced licensing, eliminating need for VMs or potential vendor lockinsubtle benets arguably, more important for Enterprise IT: reduced time for engineers to rampup new services at scaleenables Dev/Test apps to run safely on a Production clusterreduced latency between batch and services, enabling new highROI use cases 17. What are the costs of Virtualization? benchmark typeOpenVZ improvementmixed workloads210%-300%LAMP (related)38%-200%I/O throughput200%-500%response timeorder magnitudemore pronounced at higher loads 18. What are the costs of Single Tenancy? MEMCACHED CPU LOADRAILS CPU LOADHADOOP CPU LOAD100%100%100%75%75%75%50%50%50%25%25%25%0%0%0%ttCOMBINED CPU LOAD (RAILS, MEMCACHED, HADOOP) 100%75%50%25%0%Hadoop Memcached Rails 19. Example: Resource Offer in a Two-Level 20. Example: Docker on MesosMesos master servers Master Docker RegistryMesos slave serversS MS dockerSindex.docker.ioS Marathon can launch and monitor service containers from one or more Docker registries, using the Docker executor for MesosMS dockerSmarathonS MSSdockerLocal Docker RegistrySSS( optional ) 21. Example: Docker on MesosMesos Master Server init | + mesos-master | + marathon |The executor, monitored by the Mesos slave, delegates to the local Docker daemon for image discovery and management. The executor communicates with Marathon via the Mesos master and ensures that Docker enforces the specied resource limitations.Mesos Slave Server init | + docker | | | + lxc | | | + (user task, under container init system) | | | + mesos-slave | | | + /var/lib/mesos/executors/docker | | | | | + docker run | | | 22. Example: Docker on MesosMesos Master Server init | + mesos-master | + marathon |Mesos, LXC, and Docker are tied together for launchMesos Slave Server 3 21 6 When a user requests a containerDocker Registry75 init | + docker 8 | | | + lxc | | | + (user task, under container init system) | | | 4 + mesos-slave | | | + /var/lib/mesos/executors/docker | | | | | + docker run | | | 23. Production Deployments (public) 24. Opposite Ends of the Spectrum, One Common SubstrateSolaris Zones Built-in / bare metal Linux CGroupsHypervisors 25. Opposite Ends of the Spectrum, One Common SubstrateRequest / ResponseBatch 26. Case Study: Twitter (bare metal / on premise) Mesos is the cornerstone of our elastic compute infrastructure its how we build all our new services and is critical for Twitters continued success at scale. It's one of the primary keys to our data center efciency." Chris Fry, SVP Engineering services run in production: analytics, typeahead, adsTwitter engineers rely on Mesos to build all new servicesinstead of thinking about static machines, engineers think about resources like CPU, memory and diskallows services to scale and leverage a shared pool of servers across datacenters efcientlyreduces the time between prototyping and launching 27. Case Study: Airbnb (fungible cloud infrastructure) We think we might be pushing data science in the eld of travel more so than anyone has ever done before a smaller number of engineers can have higher impact through automation on Mesos." Mike Curtis,VP Engineering resource management and efciencyhelps advance engineering strategy of building small teams that can move fastkey to letting engineers make the most of AWS-based infrastructure beyond just Hadoopallowed company to migrate off Elastic MapReduceenables use of Hadoop along with Chronos, Spark, Storm, etc. 28. Media Coverage Mesosphere Adds Docker Support To Its Mesos-Based Operating System For The Data Center Frederic Lardinois TechCrunch (2013-09-26) Play Framework Grid Deployment with Mesos James Ward, Flo Leibert, et al. Typesafe blog (2013-09-19) Mesosphere Launches Marathon Framework Adrian Bridgwater Dr. Dobbs (2013-09-18) New open source tech Marathon wants to make your data center run like Googles Derrick Harris GigaOM (2013-09-04) Running batch and long-running, highly available service jobs on the same cluster Ben Lorica OReilly (2013-09-01) 29. Resources Apache Mesos Project Mesosphere Tutorials Documentation 2011 USENIX Research Paper Hindman_new.pdf Collected Notes/Archives 30. Join us! Mesos Unconference @Twitter HQ, SF Nov 19 Data Day Texas, Austin Jan 11 (incl. Mesos talk+BOF) OReilly Strata, Santa Clara Feb 11-13 (incl. Mesos talk+BOF+workshop) 31. Enterprise Data Workows with Cascading OReilly, 2013 monthly newsletter for updates, events, conference summaries, etc.: