Opensource Frameworks and BigData Processing
-
Upload
amir-sedighi -
Category
Data & Analytics
-
view
4.916 -
download
0
Transcript of Opensource Frameworks and BigData Processing
Linux and Ubuntu 14.10 Release Conf 1
Big-Data Processing utilizingOpen-Source Technology Stack
By
Amir Sedighi
http://www.linkedin.com/in/amirsedighi@amirsedighi
Linux and Ubuntu 14.10 Release Conf 2
References
● http://www.slideshare.net/BernardMarr/140228-big-data-slide-share?qid=017848e2-9e2a-4dc3-963c-52b6a90fba2a&v=default&b=&from_search=1
● http://www.forbes.com/fdc/welcome_mjx.shtml
● ZYMR Spark Your Real-Time Big Data Analytics
● http://dataconomy.com
● https://datakulfi.wordpress.com/2013/03/27/big-data-open-source-technology-landscape/
● http://www.slideshare.net/andrefaria/big-data-abc?qid=1ac97e4a-4acc-460a-b3f8-9122f7210440&v=qf1&b=&from_search=12
● https://wiki.apache.org/hadoop/PoweredBy
Linux and Ubuntu 14.10 Release Conf 3
Data Explosion
Linux and Ubuntu 14.10 Release Conf 4
Data Explosion
Linux and Ubuntu 14.10 Release Conf 5
● Big-Data is that everything we do is increasingly leaving a digital trace which we (or others) can gather, use and analyze.
– Data Providers● Business Companies● People
Linux and Ubuntu 14.10 Release Conf 6
Volume, Velocity, Variety
● “There was 5 exabytes of information created between the dawn of civilization through 2003, but that much information is now created every 2 days, and the pace is increasing.” Eric Schmidt
Linux and Ubuntu 14.10 Release Conf 7
Big-Data Processing
Linux and Ubuntu 14.10 Release Conf 8
How to provide a Big-Data processing platform using commodity machines?
Linux and Ubuntu 14.10 Release Conf 9
Vertical or Horizontal?
Linux and Ubuntu 14.10 Release Conf 10
Scale Up vs Scale Out
Linux and Ubuntu 14.10 Release Conf 11
Scale Up vs Scale Out
Linux and Ubuntu 14.10 Release Conf 12
Big-Data Processing Open-Source Technology Stack
Linux and Ubuntu 14.10 Release Conf 13
Map-Reduce
Linux and Ubuntu 14.10 Release Conf 14
Hadoop Framework
Linux and Ubuntu 14.10 Release Conf 15
Apache Hadoop Main Projects
Linux and Ubuntu 14.10 Release Conf 16
Linux and Ubuntu 14.10 Release Conf 17
Data Stores
● Data Stores
– KeyValue
– Graph
– Columnar
– Document Store
– In Memory
Linux and Ubuntu 14.10 Release Conf 18
Data Transfer
● Apache Flume
● Apache Sqoop
Linux and Ubuntu 14.10 Release Conf 19
Search
● Elasticsearch
● Apache SolR
Linux and Ubuntu 14.10 Release Conf 20
Messaging and Queuing
● Apache Kafka
● ZeroMQ
Linux and Ubuntu 14.10 Release Conf 21
Log Management
● ELK
● Logstash
● FluentD
Linux and Ubuntu 14.10 Release Conf 22
Stream Processing
● Apache Storm
● Apache Samza
● Apache Spark
Linux and Ubuntu 14.10 Release Conf 23
Machine Learning
● Apache Mahout
● MLLib
● GraphX
Linux and Ubuntu 14.10 Release Conf 24
Questions?