Demystifying big data
-
Upload
akash-mishra -
Category
Technology
-
view
99 -
download
0
Transcript of Demystifying big data
![Page 1: Demystifying big data](https://reader031.fdocuments.net/reader031/viewer/2022030313/58a746911a28ab9f5a8b4639/html5/thumbnails/1.jpg)
Demystifying Big Data
Brown Bag
![Page 2: Demystifying big data](https://reader031.fdocuments.net/reader031/viewer/2022030313/58a746911a28ab9f5a8b4639/html5/thumbnails/2.jpg)
Everything start small
![Page 3: Demystifying big data](https://reader031.fdocuments.net/reader031/viewer/2022030313/58a746911a28ab9f5a8b4639/html5/thumbnails/3.jpg)
Traditional Approach
![Page 4: Demystifying big data](https://reader031.fdocuments.net/reader031/viewer/2022030313/58a746911a28ab9f5a8b4639/html5/thumbnails/4.jpg)
Simple Process
![Page 5: Demystifying big data](https://reader031.fdocuments.net/reader031/viewer/2022030313/58a746911a28ab9f5a8b4639/html5/thumbnails/5.jpg)
Result
![Page 6: Demystifying big data](https://reader031.fdocuments.net/reader031/viewer/2022030313/58a746911a28ab9f5a8b4639/html5/thumbnails/6.jpg)
What’s next?Unanswered question of lifetime.
![Page 7: Demystifying big data](https://reader031.fdocuments.net/reader031/viewer/2022030313/58a746911a28ab9f5a8b4639/html5/thumbnails/7.jpg)
Unquenchable thirst of improvement
❏ How to Sell more?
❏ How to optimize inventory?
❏ How to engage customer more?
❏ What do my customer Like?
❏ How to reduce Operation Cost?
![Page 8: Demystifying big data](https://reader031.fdocuments.net/reader031/viewer/2022030313/58a746911a28ab9f5a8b4639/html5/thumbnails/8.jpg)
Torture the data, and it will confess to anythingRonald Coase
![Page 9: Demystifying big data](https://reader031.fdocuments.net/reader031/viewer/2022030313/58a746911a28ab9f5a8b4639/html5/thumbnails/9.jpg)
How to get Data?Humans…..
![Page 10: Demystifying big data](https://reader031.fdocuments.net/reader031/viewer/2022030313/58a746911a28ab9f5a8b4639/html5/thumbnails/10.jpg)
Ever Growing Data ❏ Historical data plays important role.
❏ Data explodes while processing.
❏ More data beats better algorithms.
![Page 11: Demystifying big data](https://reader031.fdocuments.net/reader031/viewer/2022030313/58a746911a28ab9f5a8b4639/html5/thumbnails/11.jpg)
So What is Big Data?When data has tendency to grow more than what one machine can process.
![Page 12: Demystifying big data](https://reader031.fdocuments.net/reader031/viewer/2022030313/58a746911a28ab9f5a8b4639/html5/thumbnails/12.jpg)
Getting Right Tool
![Page 13: Demystifying big data](https://reader031.fdocuments.net/reader031/viewer/2022030313/58a746911a28ab9f5a8b4639/html5/thumbnails/13.jpg)
Data Parallel Processing❏ Distribute the data [ With replication]
❏ Move Computation close to Data
❏ Process each section of Data separately
❏ Aggregate the results.
![Page 14: Demystifying big data](https://reader031.fdocuments.net/reader031/viewer/2022030313/58a746911a28ab9f5a8b4639/html5/thumbnails/14.jpg)
Advantages of Data Parallel Model
❏ No Hardware restriction. e.g Memory, CPU.
❏ No Scalability Issue
❏ Cost effectiveness.
❏ No Single point of failure.
![Page 15: Demystifying big data](https://reader031.fdocuments.net/reader031/viewer/2022030313/58a746911a28ab9f5a8b4639/html5/thumbnails/15.jpg)
That’s nice, So problem solved. But Presentation says Hadoop,Spark?
![Page 16: Demystifying big data](https://reader031.fdocuments.net/reader031/viewer/2022030313/58a746911a28ab9f5a8b4639/html5/thumbnails/16.jpg)
Challenges of Data-||-sim ❏ Data partitioning, distribution and accumulation
❏ Fault Tolerance.
❏ Distributed Coordination and management.
❏ Abstraction with the distributed complexity.
![Page 17: Demystifying big data](https://reader031.fdocuments.net/reader031/viewer/2022030313/58a746911a28ab9f5a8b4639/html5/thumbnails/17.jpg)
Big Data Ecosystem ❏ Distributed Data Storage System:
❏ Data distribution.❏ Data Replication.❏ High throughput with no single point of failure.
❏ Distributed Data Processing System:❏ Distributing Code close to data.❏ Abstracting distributed complexity from programmer.❏ Fault tolerance and handling computation failure.❏ Aggregating results.
❏ Distributed Coordination and Resource management.❏ Resource allocation.❏ Distributed configuration management.
![Page 18: Demystifying big data](https://reader031.fdocuments.net/reader031/viewer/2022030313/58a746911a28ab9f5a8b4639/html5/thumbnails/18.jpg)
Distributed Data Storage System
![Page 19: Demystifying big data](https://reader031.fdocuments.net/reader031/viewer/2022030313/58a746911a28ab9f5a8b4639/html5/thumbnails/19.jpg)
Distributed Data Processing System
![Page 20: Demystifying big data](https://reader031.fdocuments.net/reader031/viewer/2022030313/58a746911a28ab9f5a8b4639/html5/thumbnails/20.jpg)
Distributed Coordination and Resource management.
![Page 21: Demystifying big data](https://reader031.fdocuments.net/reader031/viewer/2022030313/58a746911a28ab9f5a8b4639/html5/thumbnails/21.jpg)
Lambda Architecture
![Page 22: Demystifying big data](https://reader031.fdocuments.net/reader031/viewer/2022030313/58a746911a28ab9f5a8b4639/html5/thumbnails/22.jpg)
How to Sell more?Recommendation.
![Page 23: Demystifying big data](https://reader031.fdocuments.net/reader031/viewer/2022030313/58a746911a28ab9f5a8b4639/html5/thumbnails/23.jpg)
Speed Layer
2. Product Views
1. Web Log
3. Similar Product
4. Update user product recommendation
![Page 24: Demystifying big data](https://reader031.fdocuments.net/reader031/viewer/2022030313/58a746911a28ab9f5a8b4639/html5/thumbnails/24.jpg)
How to optimize inventory?Predication
![Page 25: Demystifying big data](https://reader031.fdocuments.net/reader031/viewer/2022030313/58a746911a28ab9f5a8b4639/html5/thumbnails/25.jpg)
Batch Layer
1. User Data
2. Location Cluster per item
3. Location Cluster per item Data
3. Current Warehouse inventory
4. Inventory transfer.
![Page 26: Demystifying big data](https://reader031.fdocuments.net/reader031/viewer/2022030313/58a746911a28ab9f5a8b4639/html5/thumbnails/26.jpg)
THANK YOUAkash Mishra