Getting Started Running Apache Spark on Apache Mesos
date post
19-Aug-2015Category
Technology
view
6.256download
4
Embed Size (px)
Transcript of Getting Started Running Apache Spark on Apache Mesos
- 1. Getting Started Running Apache Spark on Apache Mesos, 2014-01-24Paco Nathan liber118.com/pxn @pacoid
- 2. Spark on Mesos, 2014-01-24 what is Apache Mesos? launch a Mesos cluster in the cloud congure and run Spark on Mesos run jobs in Spark further resources
- 3. Datacenter ComputingGoogle has been doing datacenter computing for years, to address the complexities of large-scale data workows: leveraging the modern kernel: isolation in lieu of VMs most (>80%) jobs are batch jobs, but the majority of resources (5580%) are allocated to service jobs mixed workloads, multi-tenancy among the top 10 Linux kernel OSS contributors: cgroups relatively high utilization ratesJVM? not so much !take-aways: scheduling batch is not so difcult; scheduling services is hard+expensive
- 4. Google describes the business caseTaming Latency Variability Jeff Dean plus.google.com/u/0/+ResearchatGoogle/posts/C1dPhQhcDRv
- 5. Return of the BorgReturn of the Borg: How Twitter Rebuilt Googles Secret Weapon Cade Metz wired.com/wiredenterprise/2013/03/googleborg-twitter-mesos!The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale MachinesLuiz Andr Barroso, Urs Hlzleresearch.google.com/pubs/pub35290.html! !2011 GAFS Omega John Wilkes, et al. youtu.be/0ZFMlO98Jkc
- 6. Google describes the technologyOmega: exible, scalable schedulers for large compute clustersMalte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek, John Wilkeseurosys2013.tudos.org/wp-content/uploads/2013/paper/ Schwarzkopf.pdf
- 7. Mesos open source datacenter computinga common substrate for cluster computingmesos.apache.orgheterogenous assets in your datacenter or cloud made available as a homogenous set of resources top-level Apache projectscalability to 10,000s of nodesobviates the need for virtual machinesisolation (pluggable) for CPU, RAM, I/O, FS, etc.fault-tolerant leader election based on ZookeeperAPIs in C++, Java, Python, Goweb UI for inspecting cluster stateavailable for Linux, OpenSolaris, Mac OSX
- 8. Mesos architectureservicesbatchWorkloadsApps ScaldingMPIImpalaHadoopSharkSparkMySQLKafkaJBossDjangoChronosStormRailsFrameworksMarathonKerneldistributed le systemdistributed resources: CPU, RAM, I/O, FS, rack locality, etc.DFSCluster
- 9. Mesos architectureapps: HA services, web apps, batch jobs, scripts, etc.frameworks: Spark, Storm, MPI, Jenkins, etc.task schedulers: Chronos, etc.meta-frameworks: Aurora, MarathonAPIs: C++, JVM, Py, GoMesos, distrib kernelHDFS, distrib le systemLinux: libcgroup, libprocess, libev, etc.
- 10. Mesos dynamics scheduled appsHA servicesdistrib frameworksMarathon distrib init.dMesos distrib kernelChronos distrib cron
- 11. Mesos dynamics distributed frameworkSchedulerExecutorExecutorExecutorMesos Mesos slave slaveMesos Mesos slave slaveMesos Mesos slave slaveresource offers Mesos Mesos master masteravailable resourcesdistributed kernel
- 12. Production Deployments (public)
- 13. Case Study: Twitter (bare metal / on premise)Mesos is the cornerstone of our elastic compute infrastructure its how we build all our new services and is critical for Twitters continued success at scale. It's one of the primary keys to our data center efciency."Chris Fry, SVP Engineering!blog.twitter.com/2013/mesos-graduates-from-apache-incubationwired.com/gadgetlab/2013/11/qa-with-chris-fry/ key services run in production: analytics, typeahead, ads allows services to scale and leverage a shared pool of servers across datacenters efciently reduces the time between prototyping and launchingTwitter engineers rely on Mesos to build all new servicesinstead of thinking about static machines, engineers think about resources like CPU, memory and disk
- 14. Spark on Mesos, 2014-01-24 what is Apache Mesos? launch a Mesos cluster in the cloud congure and run Spark on Mesos run jobs in Spark further resources
- 15. http://elastic.mesosphere.iolaunch a Mesos cluster in the Amazon AWS cloud in three simple steps, given: AWS credentialsSSH public keyemail address
- 16. Spark on Mesos, 2014-01-24 what is Apache Mesos? launch a Mesos cluster in the cloud congure and run Spark on Mesos run jobs in Spark further resources
- 17. http://mesosphere.io/learn/run-spark-on-mesos/ congure and run Spark on a Mesos cluster on AWS, in a seven-step tutorial
- 18. step 1: ssh to master
- 19. ssh -l ubuntu
- 20. step 2: install git, jdk-7
- 21. sudo aptitude -y install git! sudo aptitude -y install openjdk-7-jdk
- 22. step 3: download spark
- 23. wget http://spark-project.org/download/spark-0.8.0-incubating-bin-cdh4.tgz! tar xzf spark-0.8.0-incubating-bin-cdh4.tgz! cd spark-0.8.0-incubating-bin-cdh4/
- 24. step 4: sbt clean assembly
- 25. SPARK_HADOOP_VERSION=2.0.0-mr1-cdh4.4.0 sbt/sbt clean assembly
- 26. step 5: make distro, cp to HDFS
- 27. ./make-distribution.sh --hadoop 2.0.0-mr1-cdh4.4.0! mv dist spark-0.8.0-2.0.0-mr1-cdh4.4.0! tar czf spark-0.8.0-2.0.0-mr1-cdh4.4.0.tgz spark-0.8.0-2.0.0-mr1-cdh4.4.0!! hadoop fs -mkdir /tmp! hadoop fs -put spark-0.8.0-2.0.0-mr1-cdh4.4.0.tgz /tmp
- 28. step 6: cong env
- 29. cd conf/! cp spark-env.sh.template spark-env.sh! vim spark-env.sh!! export MESOS_NATIVE_LIBRARY=/usr/local/lib/libmesos.so! export SPARK_EXECUTOR_URI=hdfs:///tmp/spark-0.8.0-2.0.0-mr1-cdh4.4.0.tgz! export MASTER=zk://:2181/mesos!! cat spark-env.sh! cd ..!! ./spark-shell
- 30. et voil!
- 31. Spark on Mesos, 2014-01-24 what is Apache Mesos? launch a Mesos cluster in the cloud congure and run Spark on Mesos run jobs in Spark further resources
- 32. http://spark.incubator.apache.org/examples.html run an example job in Spark, to lter an RDD of integers,in two steps at the REPL
- 33. step 1: create an RDD
- 34. val data = 1 to 10000! val distData = sc.parallelize(data)!! distData.filter(_< 10).collect()
- 35. step 2: run the lter
- 36. Spark on Mesos, 2014-01-24 what is Apache Mesos? launch a Mesos cluster in the cloud congure and run Spark on Mesos run jobs in Spark further resources
- 37. Join us!!OReilly Strata, Santa Clara Feb 11-13 strataconf.com/strata2014 Mesos tutorial, Tue 2/11 1:30pmBOF lunch, Wed 2/12 12:10pmMesos session, Thu 2/13 2:20pmofce hours, Thu 2/13 3:15pm
- 38. More insights!Monthly newsletter for events, conf summaries, workshops, etc.:liber118.com/pxn/!collected Mesos notes:goo.gl/jPtTP
View more