Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan...
-
Upload
codemotion -
Category
Technology
-
view
79 -
download
0
Transcript of Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Cartia - Codemotion Milan...
Sviluppare applicazioni nell'era dei "Big Data" con Scala e SparkMario Cartia
MILAN 25-26 NOVEMBER 2016
$ whoamiMario CartiaChief System Egineer
Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark
Big DataRisk
Big DataOpportunity
Jonas Bonér
The Reactive Manifesto (2013)
The Reactive Manifesto Responsiveo The system responds in a
timely manner if at all possible
Resiliento The system stays responsive
in the face of failure
The Reactive Manifesto Event-Driven
o Reactive Systems rely on asynchronous message-passing to establish a boundary between components that ensures loose coupling, isolation and location transparency
Elastico The system stays responsive under
varying workload reacting to changes in the input rate by increasing or decreasing the allocated resources
BytecodeInteroperability
History The design of Scala started in
2001 at EPFL, Switzerland by Martin Odersky
First internal use in 2003 to teach “Functional and Logic Programming Course”
Public announcement of Scala 1.0 in 2004
History
The latest version of Scala is 2.12.0 released on 3 november 2016
Scala 2.0 was released in march 2006 On May 2011 Odersky and Bonér launched Typesafe Inc. to provide commercial support and education for Scala (Lightbend from feb 2016)
Features Object Orientedo You can construct elegant class
hierarchies for maximum code reuse and extensibility
Functionalo You can implement object
behavior using higher-order functions
Features Object Orientedo In contrast to Java, all values in
Scala are objects (including primitive types and functions)
o Multiple inheritance using traits (mixin-based composition )
o Statically typedo …
Features Functionalo Every function is a valueo Lambda expressionso Immutable objectso Higher-order functionso Case classes with support for
pattern matching to model algebraic types
o …
Features Othero Type inferenceo Infix notationo Parallel and concurrent
programmingo Actor model (Akka)o …
Akka Is a free and open-source toolkit
and runtime simplifying the construction of concurrent and distributed applications on the JVM
Supports multiple programming models for concurrency, but it emphasizes actor-based concurrency, with inspiration drawn from Erlang
Akka Language bindings exist for both
Java and Scala Akka is written in Scala and, as of
Scala 2.10, Akka's actor implementation is included as part of the Scala standard library
Concurrency is message-based and asynchronous
O’REILLY 2016 European Software Development Salary Survey
Top Adopters
Useful Tools scala scalac scaladoc scalap
similar to Javacounterpart
Useful Tools scalao With no arguments specified,
a Scala shell (REPL) starts and reads commands interactively
$ scalaWelcome to Scala version 2.12.0 Type in expressions to have them evaluated.Type :help for more information.
scala> val i = 2i: Int = 2
scala>
Useful Tools Scala Build Tool (sbt) is an open
source build tool for Scala projects, similar to Maven or Ant with the following characteristics:o build descriptions written in Scala
using a DSLo dependency management using
Ivy (supports Maven-format repositories)
o support for mixed Java/Scala projects
o …
Hello, World!
Hello, World! (REPL)
Learning Resources
Learning Resources
Learning Resources
Learning Resources
History Originally developed in 2012 at
the University of California, Berkeley's AMPLab
In 2013 creators founded a company named Databricks that provide services and support for Spark
First stable release (1.0) on May 2014
Features Provides an interface for
programming entire clusters with implicit data parallelism and fault-tolerance
Provides programmers with an API centered on a data structure called the resilient distributed dataset (RDD)
Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk
Modules Spark SQLo Lets you query structured data
inside Spark programs, using either SQL or a easy to use DataFrame API
o Spark SQL reuses the Hive frontend and metastore, giving you full compatibility with existing Hive data, queries, and UDFs
Modules Spark MLlibo Contains many algorithms and
utilities, including:• Classification• Regression• Clustering• Recommendation• Distributed linear algebra• Statistics• …
Modules Spark Streamingo Brings Apache Spark's language-
integrated API to stream processing, letting you write streaming jobs the same way you write batch jobs
o Recovers both lost work and operator state (e.g. sliding windows) out of the box, without any extra code on your part
Modules Spark Streamingo Lets you reuse the same code for
batch processing, join streams against historical data, or run ad-hoc queries on stream state
o Can read data from HDFS, Flume, Kafka, Twitter and ZeroMQ or custom data sources
Modules Spark GraphXo Collection of API for graphs and
graph-parallel computationo Provides a variety of graph
algorithms like:• PageRank• Connected components• Label propagation• SVD++• Strongly connected components• Triangle count
How it works? Spark features an advanced
Directed Acyclic Graph (DAG) engine supporting cyclic data flow
Each Spark job creates a DAG of task stages to be performed on the cluster
How it works?
How it works?
How it works?val textFile = sc.textFile("hdfs://...") val counts = textFile.flatMap(line => line.split(" ")).map( word => (word, 1)) .reduceByKey(_ + _)
counts.saveAsTextFile("hdfs://...")
Driver Program
RDD
SparkContext
Transformations
Action
How it works?Spark UI
Running modes Standaloneo Spark provides a simple
standalone deploy mode mainly for testing purpose
YARNo Send jobs to Hadoop cluster
Mesoso Send jobs to Apache Mesos
distributed kernel
Learning Resources
Learning Resources
Learning Resources
I corsi di Codemotion Training
Percorsi didattici dal taglio pratico – anche online
> WEB APP SECURITY
> WEB DEVELOPMENT
> IOT
> UX & UI
> BIG DATA
> MOBILE DEVELOPMENT
> LEGAL SOFTWARE DISCIPLINE
> FRONTEND DEVELOPMENT
Bootcamp “Sviluppo Applicazioni Big Data con
Scala e Spark”Dove: Milano
Quando: 2 dicembre 2016Info: desk Codemotion
Prossimo appuntamento!
Email: [email protected]
Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark
Question Time!
Thanks!
MILAN 25-26 NOVEMBER 2016
Follow me!https://twitter.com/mariocartiahttps://it.linkedin.com/in/mariocartia
Email:[email protected]
All pictures belongto their respective authors