What’s New in Spark 0.6 and Shark 0.2

What’s New in Spark 0.6 and Shark 0.2November 5, 2012

UC BERKELEYwww.spark-project.org

AgendaIntro & Spark 0.6 tour (Matei Zaharia)Standalone deploy mode (Denny Britz)Shark 0.2 (Reynold Xin)Q & A

What Are Spark & Shark?Spark: fast cluster computing engine based on general operators & in-memory computingShark: Hive-compatible data warehouse system built on Spark

Both are open source projects from the UCBerkeley AMP Lab

What is the AMP Lab?60-person lab focusing on big dataFunded by NSF, DARPA, 18 companiesGoal: build an open-source, next-generation analytics stack

UC BERKELEY Spark

Shark Stre

Some Exciting NewsRecently, three full-time developers joined AMP to work on these projectsAlso encourage outside contributions!

»This release: Shark server (Yahoo!), improved accumulators (Quantifind)

Spark 0.6 ReleaseBiggest release so far in terms of featuresBiggest in terms of developers (18 total, 12 new)Focus areas: ease-of-use and performance

Ease-of-UseSpark already had good traction despite two fairly researchy aspects

»Scala language»Requirement to run on Mesos

A big goal was to improve these:»Java API (and upcoming API in Python)»Simpler deployment (standalone mode,

Java APIlines.filter(_.contains(“error”)).count()

JavaRDD<String> lines = sc.textFile(...);

lines.filter(new Function<String, Boolean>() { Boolean call(String s) { return s.contains(“error”); }}).count();

Java API FeaturesSupports all existing Spark features

»RDDs, accumulators, broadcast variables

Retains type safety through specific classes for RDDs of special types

»E.g. JavaPairRDD<K, V> for key-value pairs

Using Key-Value Pairsimport scala.Tuple2;

JavaRDD<String> words = ...;

JavaPairRDD<String, Integer> ones = words.map( new PairFunction<String, String, Integer> { public Tuple2<String, Integer> call(String s) { return new Tuple2(s, 1); } });

// Can now call ones.reduceByKey(), groupByKey(), etc

More info: spark-project.org/docs/0.6.0/

Coming Next: PySparklines = sc.textFile(sys.argv[1])

counts = lines.flatMap(lambda x: x.split(' ')) \ .map(lambda x: (x, 1)) \ .reduceByKey(lambda x, y: x + y)

Simpler DeploymentRefactored Spark’s scheduler to allow running on different cluster managersDenny will talk about the standalone mode…

Other Ease-of-Use WorkDocumentation

»Big effort to improve Spark’s help and Scaladoc

Debugging hints (pointers to user code in logs)Maven Central artifacts

spark-project.org/documentation.html

PerformanceNew ConnectionManager and BlockManager

»Replace simple HTTP shuffle with faster, async NIO

Faster control-plane (task scheduling & launch)Per-RDD control of storage level

Some Graphs

020406080

100120

Spark 0.5

Large User App(2000 maps / 1000 reduces)

0100200300400500600700800900

1000Spark 0.5

Wikipedia Search Demo

Per-RDD Storage Levelimport spark.storage.StorageLevelval data = file.map(...)

// Keep in memory, recompute when out of space// (default behavior with cache())data.persist(StorageLevel.MEMORY_ONLY)

// Drop to disk instead of recomputingdata.persist(StorageLevel.MEMORY_AND_DISK)

// Serialize in-memory datadata.persist(StorageLevel.MEMORY_ONLY_SER)

CompatibilityWe’ve always strived to stay source-compatible!Only change in this release is in configuration: spark.cache.class replaced with per-RDD levels

Shark 0.2Hive compatibility improvementsThrift server modePerformance improvementsSimpler deployment (comes with Spark 0.6)

Hive CompatibilityHive 0.9 supportFull UDF/UDAF supportADD FILE support for running scriptsUser-supplied jars using ADD JAR

Thrift ServerContributed by Yahoo!, compatible with Hive Thrift serverEnable multiple clients share cached tablesBI tool integration (e.g. Tableau)

Performance

010203040506070

Shark 0.1

Group By(1B items, 150M distinct)

250Shark 0.1

Join(1B join 150M)

Shark 0.3 PreviewIn-memory columnar compression (dictionary encoding, run length encoding, etc)Map pruningJVM bytecode generation for expression evalsPersist cached table meta data across sessions

Spark 0.7+Spark StreamingPySpark: Python API for SparkMemory monitoring dashboard

What’s New in Spark 0.6 and Shark 0.2

Documents

Transcript of What’s New in Spark 0.6 and Shark 0.2

Supplementary Information for€¦ · 0.2 0.0 0.2 0.4 0.6 Coefficient center of mass 0.0 0.2 0.4 0.6 0.8 Centered polynomial 1.0 One-sided polynomial Optimal constant (a) (b) (c)

Cell a 0 0.2 0.4 0.6 0.8 1 020406080100120140 separation distance (km) 0 0.2 0.4 0.6 0.8 1 020000400006000080000100000120000 distance(m) (h) range of.

Siete vytvorené z korelácií casových radov · 2019. 4. 4. · Korelácie Vytvortesieť,spravtezhlukovanieanájditekomunityvmin. kostre.-1-0.8-0.6-0.4-0.2 0 0.2 0.4 0.6 0.8 1 Belgium

Stillwater Cove Regional Park - Sonoma Countyparks.sonomacounty.ca.gov/uploadedFiles/Parks/Get... · k Stillwater Cove Paci˚c Ocean Ocean Cove 0.6 0.4 0.2 0.1 0.1 0.1 0.6 0.2 Beach

Banque de France · 0 100 200 300 400 0 0.2 0.4 0.6 0.8 1 Asset level ! 0 100 200 300 400 0 0.2 0.4 0.6 0.8 1 Asset level 0! 0 100 200 300 400 0 0.2 0.4 0.6 0.8 1 Asset level ! D

Overview - cdn.ymaws.com · 0 % - Less than 0.06 (Blackland Praire Clay Urbanization Curves) 33 % - 0.06 to 0.2 66 % - 0.2 to 0.6 100 % - 0.6 to 2.0 (Cross Timbers Sandy Loam Urbanization

TP3“ModellierungundHomogenisierungmagneto ... · 2018. 6. 4. · −0.4 −0.2 0 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 −0.8 −0.6 −0.4 −0.2 0 −0.6 −0.4 −0.2 0 0.2 0.4

1.0—1.65 0.8—1.0 0.6—0.8 0.4—0.6 0.2—0.4 0.0—0.2 -0.95—0 · Title: 05 Created Date: 5/8/2014 7:34:13 PM

Lecture 8, L8 - KTH · Lecture 8, L8: Short-term planning of hydro power systems. System planning 2014 ... production equivalent . 0 0.2 0.4 0.6 0.8 1-0.2 0 0.2 0.4 0.6 0.8 1 1.2

DYNAMICAL SYSTEMS Tutorial 17We start by viewing the map. viewmap@D; 0.2 0.4 0.6 0.8 1 x 0.2 0.4 0.6 0.8 1 rH1-xLx, 8r

presentation plfm package - Meetupfiles.meetup.com/2968362/presentation_plfm_package.pdf0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 F3 F4 Nissan Qashgai BMW X5 Volvo V50 Renault

MCMC Package Example (Version 0.7-3)Srs1 & Srs4 0 10 20 30-0.2 0.6 Lag Srs1 & Srs5-30 -15 0-0.2 0.6 Lag ACF Srs2 & Srs1 0 10 20 30-0.2 0.6 Lag Series 2 0 10 20 30-0.2 0.6 Lag Srs2

An Effective Framework for Chaotic Image …downloads.hindawi.com/journals/scn/2018/8402578.pdfSecurityandCommunicationNetworks −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 Voltage

A Comparison of Approaches to Carrier Generation for ...nagendra/papers/vlsiconf2009_zigbeevco-sl.pdf−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 seconds volts LC VCO+divider

· Stability curve 0.7 0.6 0.5 0.4 0.3 0.2 -0.2 -0.3 20 ax GZ = 0.6 60 mat 61.8 80 100 ...

Animal Sciences (n=61) PSU Animal Sciencepersonal.psu.edu/drh20/NRC/normalized_versions_of_some...0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Normalized S ranking Normalized R

1 -1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 ... . |XXXXXX . 0.229 8 . |XXXXXXXX. 0.324 9 . |XXXXXXXX . 0.335 10 . X| . -0.057 11 . |XX . 0.083 12 - - - - - - . |XXXXXXXXXXXXXX

Heterogeneous Gray-Temperature Fusion-Based Deep Learning …downloads.hindawi.com/journals/js/2019/4658068.pdf · JournalofSensors 0 0.2 0.4 0.6 0.8 1 recall 0.2 0.3 0.4 0.5 0.6

0201 BasicLine Mono 2014-02 GB RZ · for restricted installation conditions ... Use our free project planning service. ... 0 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8

XTCE250L XTCE300L XTCE400M XTCE500M · 2020. 4. 7. · Dr anie Фиксиране Reţinere Pridr avanje (0-0.2)xUsmin ≦10ms (0-0.2)xUsmin >10ms (0.21-0.6)xUsmin ≦12ms (0.21-0.6)xUsmin