Big data technology by Data Sciences Thailand ในงาน THE FIRST NIDA BUSINESS ANALYTICS AND...
Transcript of Big data technology by Data Sciences Thailand ในงาน THE FIRST NIDA BUSINESS ANALYTICS AND...
Big Data Technologies
The First NIDA Business Analytics and Data Sciences Contest/Conference
วันที่ 1-2 กันยายน 2559 ณ อาคารนวมินทราธิราช สถาบันบัณฑิตพัฒนบริหารศาสตร์
https://businessanalyticsnida.wordpress.com
https://www.facebook.com/BusinessAnalyticsNIDA/
Big Data คืออะไร
สถาปัตยกรรมสําหรับข้อมลูขนาดใหญ่เป็นเช่นไร
จะจัดการกับข้อมูลขนาดใหญ่ได้อย่างไร
การประมวลผลข้อมลูขนาดใหญ่จะทาํเช่นไร
Unstructured data ต่างจาก Relational Database Management System หรือไม่
เทคโนโลยีล่าสุดของข้อมลูขนาดใหญ่มีอะไรบ้าง
ทีมงาน Data Science Thailand
นวมินทราธิราช 3001 วันที่ 1 กันยายน 2559 15.15-16.30 น.
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
People think this [big data] is a tech revolution. But it is really a business revolution enabled by technology.
– Steven Messer
Let's define “Big Data”
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
3Vs : Volume Velocity Variety
5Vs : Veracity Values
Let's define “Big Data”
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Let's talk about Scalability
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Scale-UP (vertical)
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Scale-OUT (horizontal)
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
What are the challenges?
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Even with the best hardware, frequent failure is the norm.
Challenges:Handling failure
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Divide & Conquer strategies
Challenges:Parallelization
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Challenges:Barrier to entry
“The IBM Blue Gene/P supercomputer installation at the Argonne Leadership Angela Yang Computing Facility located in the Argonne National Laboratory.”https://en.wikipedia.org/wiki/File:IBM_Blue_Gene_P_supercomputer.jpg
Challenges:Barrier to entry
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Challenges:Barrier to entry
“The Borg, a beowulf cluster used by the McGill University pulsar group to search for binary pulsars (among other things).”https://en.wikipedia.org/wiki/File:Beowulf-cluster-the-borg.jpg
Challenges:Barrier to entry
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Let's talk about the revolution
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Hadoop : A brief history
Cutting & Cafarella
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Hadoop : Abstraction Layers
Storage Layer : HDFS (GFS)
Compute Layer : YARN + MapReduce
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Hadoop
HDFS(Hadoop Distributed File System)
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Hadoop : HDFS
FILE_001 A B C
NODE1 NODE2 NODE3
A2
A1
A3
B1
B2 B3
C1
C2C3
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Hadoop : HDFShttps://ha doop.apach e.org/doc s/r1.2.1/h dfs_desig n.html
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Hadoop
YARN(Yet Another Resource Negotiator)
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Hadoop : YARN
A B C D
DATA1 DATA2 DATA1 DATA2
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Hadoop : YARN
A B
DATA1 DATA2Network is expensive
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Hadoop : YARN
A B C D
DATA1 DATA2 DATA1 DATA2
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Hadoop : YARN
A B C D
DATA1 DATA2 DATA1 DATA2
App Master
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Hadoop : YARN
A B C D
DATA1 DATA2 DATA1 DATA2
App Master
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Hadoop : MapReduce Paradigm
Consists of 2 primary functionsMap & Reduce
Both functions transform data
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Hadoop : MapReduce Paradigm
https://twitter.com/steveluscher/status/741089564329054208
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Hadoop : MapReduce Paradigm
https://twitter.com/steveluscher/status/741089564329054208
Parallelism achieved!
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Hadoop :The Big Data enabler
●Fault-tolerant storage●Fault-tolerant computation●Parallel processing paradigm for the rest of us ?????
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Hadoop : Not everyone codes
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Hive : A High-level toolfor the rest of us
●Developed by Facebook●HiveQL (looks just like SQL)●Schema-on-Read●Data warehousing at scale
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Hive : A High-level toolfor the rest of us
HiveQL(SQL) MapReduce job(s)→
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Hive : A High-level toolfor the rest of us
DDLCREATE TABLE inventory(sku int, product_name string, manufacturer string, num_instock int) ROW FORMAT DELIMITEDFIELDS TERMINATED BY ','STORED AS TEXTFILETBLPROPERTIES("skip.header.line.count"="1");
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Hive : A High-level toolfor the rest of us
DMLLOAD DATA INPATH '/home/user/products.csv' OVERWRITE INTO TABLE inventory;
SELECT SUM(num_instock) as instock_countFROM inventory;
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Other selected tools
● Hbase● Kafka● Spark (SQL, GraphX, MLLib, Streaming)● Docker● Kubernetes● Mesos
Impacts of Hadoop(and other “Big Data” tools)
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Impacts of Hadoop(and other “Big Data” tools)
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Democratization of high-volume, high-velocity data processing
Impacts of Hadoop(and other “Big Data” tools)
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
We support Hadoop
Impacts of Hadoop(and other “Big Data” tools)
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
We provide an alternative
Adopting “Big Data” technologies
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Adopting “Big Data” technologies
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Big Data is not a technology. It’s about answering business questions and delivering value.
– Teresa de Onis
Adopting “Big Data” technologies
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
It's not about building a Hadoop clusteror other “Big Data” solutions for that matter
Adopting “Big Data” technologies
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
http://mat tturck.co m/2016/02 /01/big-d ata-lands cape/
Adopting “Big Data” technologies
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Big Data tools = building blocksBuild whatever you want
Adopting “Big Data” technologies
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
https://gi thub.com/ fluxcapac itor/pipe line
Adopting “Big Data” technologies
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
It requires a team effort… and support from the management
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
If You Want To Succeed With Big Data,
Start Small
– Doug Cutting
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Thank you for your time
Q&A