Getting Started Running Apache Spark on Apache Mesos

Click here to load reader

  • date post

    19-Aug-2015
  • Category

    Technology

  • view

    6.256
  • download

    4

Embed Size (px)

Transcript of Getting Started Running Apache Spark on Apache Mesos

  1. 1. Getting Started Running Apache Spark on Apache Mesos, 2014-01-24Paco Nathan liber118.com/pxn @pacoid
  2. 2. Spark on Mesos, 2014-01-24 what is Apache Mesos? launch a Mesos cluster in the cloud congure and run Spark on Mesos run jobs in Spark further resources
  3. 3. Datacenter ComputingGoogle has been doing datacenter computing for years, to address the complexities of large-scale data workows: leveraging the modern kernel: isolation in lieu of VMs most (>80%) jobs are batch jobs, but the majority of resources (5580%) are allocated to service jobs mixed workloads, multi-tenancy among the top 10 Linux kernel OSS contributors: cgroups relatively high utilization ratesJVM? not so much !take-aways: scheduling batch is not so difcult; scheduling services is hard+expensive
  4. 4. Google describes the business caseTaming Latency Variability Jeff Dean plus.google.com/u/0/+ResearchatGoogle/posts/C1dPhQhcDRv
  5. 5. Return of the BorgReturn of the Borg: How Twitter Rebuilt Googles Secret Weapon Cade Metz wired.com/wiredenterprise/2013/03/googleborg-twitter-mesos!The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale MachinesLuiz Andr Barroso, Urs Hlzleresearch.google.com/pubs/pub35290.html! !2011 GAFS Omega John Wilkes, et al. youtu.be/0ZFMlO98Jkc
  6. 6. Google describes the technologyOmega: exible, scalable schedulers for large compute clustersMalte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek, John Wilkeseurosys2013.tudos.org/wp-content/uploads/2013/paper/ Schwarzkopf.pdf
  7. 7. Mesos open source datacenter computinga common substrate for cluster computingmesos.apache.orgheterogenous assets in your datacenter or cloud made available as a homogenous set of resources top-level Apache projectscalability to 10,000s of nodesobviates the need for virtual machinesisolation (pluggable) for CPU, RAM, I/O, FS, etc.fault-tolerant leader election based on ZookeeperAPIs in C++, Java, Python, Goweb UI for inspecting cluster stateavailable for Linux, OpenSolaris, Mac OSX
  8. 8. Mesos architectureservicesbatchWorkloadsApps ScaldingMPIImpalaHadoopSharkSparkMySQLKafkaJBossDjangoChronosStormRailsFrameworksMarathonKerneldistributed le systemdistributed resources: CPU, RAM, I/O, FS, rack locality, etc.DFSCluster
  9. 9. Mesos architectureapps: HA services, web apps, batch jobs, scripts, etc.frameworks: Spark, Storm, MPI, Jenkins, etc.task schedulers: Chronos, etc.meta-frameworks: Aurora, MarathonAPIs: C++, JVM, Py, GoMesos, distrib kernelHDFS, distrib le systemLinux: libcgroup, libprocess, libev, etc.
  10. 10. Mesos dynamics scheduled appsHA servicesdistrib frameworksMarathon distrib init.dMesos distrib kernelChronos distrib cron
  11. 11. Mesos dynamics distributed frameworkSchedulerExecutorExecutorExecutorMesos Mesos slave slaveMesos Mesos slave slaveMesos Mesos slave slaveresource offers Mesos Mesos master masteravailable resourcesdistributed kernel
  12. 12. Production Deployments (public)
  13. 13. Case Study: Twitter (bare metal / on premise)Mesos is the cornerstone of our elastic compute infrastructure its how we build all our new services and is critical for Twitters continued success at scale. It's one of the primary keys to our data center efciency."Chris Fry, SVP Engineering!blog.twitter.com/2013/mesos-graduates-from-apache-incubationwired.com/gadgetlab/2013/11/qa-with-chris-fry/ key services run in production: analytics, typeahead, ads allows services to scale and leverage a shared pool of servers across datacenters efciently reduces the time between prototyping and launchingTwitter engineers rely on Mesos to build all new servicesinstead of thinking about static machines, engineers think about resources like CPU, memory and disk
  14. 14. Spark on Mesos, 2014-01-24 what is Apache Mesos? launch a Mesos cluster in the cloud congure and run Spark on Mesos run jobs in Spark further resources
  15. 15. http://elastic.mesosphere.iolaunch a Mesos cluster in the Amazon AWS cloud in three simple steps, given: AWS credentialsSSH public keyemail address
  16. 16. Spark on Mesos, 2014-01-24 what is Apache Mesos? launch a Mesos cluster in the cloud congure and run Spark on Mesos run jobs in Spark further resources
  17. 17. http://mesosphere.io/learn/run-spark-on-mesos/ congure and run Spark on a Mesos cluster on AWS, in a seven-step tutorial
  18. 18. step 1: ssh to master
  19. 19. ssh -l ubuntu
  20. 20. step 2: install git, jdk-7
  21. 21. sudo aptitude -y install git! sudo aptitude -y install openjdk-7-jdk
  22. 22. step 3: download spark
  23. 23. wget http://spark-project.org/download/spark-0.8.0-incubating-bin-cdh4.tgz! tar xzf spark-0.8.0-incubating-bin-cdh4.tgz! cd spark-0.8.0-incubating-bin-cdh4/
  24. 24. step 4: sbt clean assembly
  25. 25. SPARK_HADOOP_VERSION=2.0.0-mr1-cdh4.4.0 sbt/sbt clean assembly
  26. 26. step 5: make distro, cp to HDFS
  27. 27. ./make-distribution.sh --hadoop 2.0.0-mr1-cdh4.4.0! mv dist spark-0.8.0-2.0.0-mr1-cdh4.4.0! tar czf spark-0.8.0-2.0.0-mr1-cdh4.4.0.tgz spark-0.8.0-2.0.0-mr1-cdh4.4.0!! hadoop fs -mkdir /tmp! hadoop fs -put spark-0.8.0-2.0.0-mr1-cdh4.4.0.tgz /tmp
  28. 28. step 6: cong env
  29. 29. cd conf/! cp spark-env.sh.template spark-env.sh! vim spark-env.sh!! export MESOS_NATIVE_LIBRARY=/usr/local/lib/libmesos.so! export SPARK_EXECUTOR_URI=hdfs:///tmp/spark-0.8.0-2.0.0-mr1-cdh4.4.0.tgz! export MASTER=zk://:2181/mesos!! cat spark-env.sh! cd ..!! ./spark-shell
  30. 30. et voil!
  31. 31. Spark on Mesos, 2014-01-24 what is Apache Mesos? launch a Mesos cluster in the cloud congure and run Spark on Mesos run jobs in Spark further resources
  32. 32. http://spark.incubator.apache.org/examples.html run an example job in Spark, to lter an RDD of integers,in two steps at the REPL
  33. 33. step 1: create an RDD
  34. 34. val data = 1 to 10000! val distData = sc.parallelize(data)!! distData.filter(_< 10).collect()
  35. 35. step 2: run the lter
  36. 36. Spark on Mesos, 2014-01-24 what is Apache Mesos? launch a Mesos cluster in the cloud congure and run Spark on Mesos run jobs in Spark further resources
  37. 37. Join us!!OReilly Strata, Santa Clara Feb 11-13 strataconf.com/strata2014 Mesos tutorial, Tue 2/11 1:30pmBOF lunch, Wed 2/12 12:10pmMesos session, Thu 2/13 2:20pmofce hours, Thu 2/13 3:15pm
  38. 38. More insights!Monthly newsletter for events, conf summaries, workshops, etc.:liber118.com/pxn/!collected Mesos notes:goo.gl/jPtTP