New trends in big data: in-memory analytics, streaming computing and distributed machine learning
-
Upload
natalino-busa -
Category
Data & Analytics
-
view
636 -
download
1
Transcript of New trends in big data: in-memory analytics, streaming computing and distributed machine learning
Trends in Big Data.
Natalino BusaData Platform Architect at Ing
Play with your phones
Re-think Big DataHadoop has turned 10
Memory is eating Big DataAmazon is delivering instances with 2 TB RAM
Facebook, Microsoft: 90% workload below the 100 GB
Machine Learning algorithms fit on a single node
250 MB hard disk drive from 1979
I like Big Data and I cannot lie.
Disk -> RAMHadoop -> Spark
Map-Reduce -> Data Flow Graphs
HDFS -> Storage, MPPs, NoSQL
Streaming and Real-Time Analytics
Batch -> Event-DrivenETL -> Streaming
Hive -> Flink, Akka, Spark
Stream Centric Architectures
Spark - RDDs
Streaming SQL MLlib Graphx
Analytics, Statistics, Data Science, Model Training
HDFS NoSQL SQL
Data Sources
Map-Reduce
HDFS KAFKA
Spark: Unified Distributed Computing:SQL + Machine Learning + Graph Analytics
Hive
Clusters -> ResourcesOrchestrated -> Isolated
Static -> Disposable
YARN, MESOS, CoreOS, Kubernetes
Application-oriented Infrastructure
Elastic: Docker, Mesos, Yarn, Kubernetes
Data Processing: Flink, Spark, Akka
Indexing: Elastic Search, Deep Learning
APIs and microservices: Akka, Python, Java
Data storage: SQL, NoSQL, HDFS, Streaming
MESOS, YARN
Spark
Streaming
SQL MLlib
Graphx
DBs
ES
C*
Application Oriented Architectures
That’s all folks!
Natalino BusaData Platform Architect at Ing
@natbusa