Big Data in Action

18
Big Data in Action Ngon Pham, Lana Engineer

description

Big Data technologies and applications in Vietnam

Transcript of Big Data in Action

Page 1: Big Data in Action

Big Data in ActionNgon Pham, Lana Engineer

Page 2: Big Data in Action

Introduction● Introduction● Problem● Approach● Demo● Big Data in Vietnam

Page 3: Big Data in Action

Introduction● Internet-enabled devices

○ Tons of data generated every second● Hardware becomes much cheaper

○ We can now store and process much more data

Page 4: Big Data in Action

Problem● How to process 10TB, how long and how

much? ○ Assume

■ Amazon EC2 ■ HDD read at 50MB/s■ Computation time is less than I/O time

Page 5: Big Data in Action

Problem● 1 machine, 1 core, 1 HDD

○ Time: 55.56 hours○ Amazon Cost: $0.12 x 55.56 = $6.67

● 10 machines, 40 cores, 40 HDD○ Time: 1.39 hours○ Amazon Cost: $0.48 x 10 x 1.39 = $6.67⇒ The same cost but 40x faster

Page 6: Big Data in Action

Question● How to divide data/process between

machines?● How to make each process read data inside

the machine directly instead of another?● How to replicate data, restore the process if

there is failure?● Lots of task management questions...

Page 7: Big Data in Action

Approach● Hadoop ● MongoDB● Spark

Page 8: Big Data in Action

Hadoop Approach● Storage

○ HDFS

Page 9: Big Data in Action

Hadoop Approach● Computation

○ MapReduce

Page 10: Big Data in Action

MongoDB Approach● Storage

○ Document

Page 11: Big Data in Action

MongoDB Approach● Computation

○ SQL○ Aggregation○ MapReduce

Page 12: Big Data in Action

Spark Approach● Storage

○ Resilient distributed dataset (RDD)

○ Persistent backed byHDFS / HBase...

Page 13: Big Data in Action

Spark Approach● Computation

○ Mixed○ In-memory

computing

Page 14: Big Data in Action

Demo● Hadoop

○ Run script to create Amazon cluster○ Play with Hadoop / HDFS / Spark○ Process Wikipedia data

● MongoDB○ Collect data from different sources and analyze

Page 15: Big Data in Action

Big Data in Vietnam

Page 16: Big Data in Action

Big Data in Vietnam● Why is MongoDB popular?

○ Lots of PHP developers prefer○ Simple to setup and use○ Similar to MySQL

Page 17: Big Data in Action

Big Data in Vietnam● Hadoop is used by a few big local online

companies & international startups○ Analyze tons of data○ Create new competitive advantage⇒ But there is a big shortage of skilled engineers

Page 18: Big Data in Action

Q & A

Q & A