HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous...
-
Upload
xiao-qin -
Category
Technology
-
view
195 -
download
3
description
Transcript of HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous...
![Page 1: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters](https://reader035.fdocuments.net/reader035/viewer/2022062514/55905bb61a28ab542e8b45f2/html5/thumbnails/1.jpg)
Analysis of Data Placement Strategy based on Computing Power of Nodes onHeterogeneous Hadoop Clusters
Sanket Reddy Chintapalli Advisor - Dr. Xiao Qin
![Page 2: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters](https://reader035.fdocuments.net/reader035/viewer/2022062514/55905bb61a28ab542e8b45f2/html5/thumbnails/2.jpg)
Presentation Overview
● Synopsis● Mapreduce Programming Model Overview● HDFS Overview● Motivation● Design● Software Description● Hardware Description● Results● Conclusion
![Page 3: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters](https://reader035.fdocuments.net/reader035/viewer/2022062514/55905bb61a28ab542e8b45f2/html5/thumbnails/3.jpg)
Synopsis
● Data placement strategy● Heterogeneous Clusters● Computing Power● Calculating Computing Ratio● WordCount and Grep
![Page 4: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters](https://reader035.fdocuments.net/reader035/viewer/2022062514/55905bb61a28ab542e8b45f2/html5/thumbnails/4.jpg)
MapReduce Model
● Hadoop 1.0 and Hadoop 2.0● Master - Slave Model● JobTracker and TaskTracker Hadoop 1.0● YARN Hadoop 2.0● Resource Manager YARN● Application Manager YARN● Node Manager YARN● MapReduce Flow
![Page 5: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters](https://reader035.fdocuments.net/reader035/viewer/2022062514/55905bb61a28ab542e8b45f2/html5/thumbnails/5.jpg)
Mapreduce Model
![Page 6: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters](https://reader035.fdocuments.net/reader035/viewer/2022062514/55905bb61a28ab542e8b45f2/html5/thumbnails/6.jpg)
Mapreduce Model - 1.0
![Page 7: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters](https://reader035.fdocuments.net/reader035/viewer/2022062514/55905bb61a28ab542e8b45f2/html5/thumbnails/7.jpg)
Mapreduce Model - YARN - 2.0
![Page 8: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters](https://reader035.fdocuments.net/reader035/viewer/2022062514/55905bb61a28ab542e8b45f2/html5/thumbnails/8.jpg)
Mapreduce Model - Flow
![Page 9: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters](https://reader035.fdocuments.net/reader035/viewer/2022062514/55905bb61a28ab542e8b45f2/html5/thumbnails/9.jpg)
HDFS
● Namenode● Datanode● Replication● Federated Namenodes
![Page 10: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters](https://reader035.fdocuments.net/reader035/viewer/2022062514/55905bb61a28ab542e8b45f2/html5/thumbnails/10.jpg)
HDFS Architecture
![Page 11: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters](https://reader035.fdocuments.net/reader035/viewer/2022062514/55905bb61a28ab542e8b45f2/html5/thumbnails/11.jpg)
HDFS Federated Namenodes
![Page 12: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters](https://reader035.fdocuments.net/reader035/viewer/2022062514/55905bb61a28ab542e8b45f2/html5/thumbnails/12.jpg)
HDFS Federated Namenodes
● Scalability● Performance● Isolation - overload
![Page 13: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters](https://reader035.fdocuments.net/reader035/viewer/2022062514/55905bb61a28ab542e8b45f2/html5/thumbnails/13.jpg)
Motivation
![Page 14: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters](https://reader035.fdocuments.net/reader035/viewer/2022062514/55905bb61a28ab542e8b45f2/html5/thumbnails/14.jpg)
Software Description
● Hadoop 2.3.0● Maven● Eclipse● Protocol Buffers
![Page 15: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters](https://reader035.fdocuments.net/reader035/viewer/2022062514/55905bb61a28ab542e8b45f2/html5/thumbnails/15.jpg)
Hardware Description
![Page 16: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters](https://reader035.fdocuments.net/reader035/viewer/2022062514/55905bb61a28ab542e8b45f2/html5/thumbnails/16.jpg)
Design
Run WordCount and Grep Applications on individual nodes
![Page 17: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters](https://reader035.fdocuments.net/reader035/viewer/2022062514/55905bb61a28ab542e8b45f2/html5/thumbnails/17.jpg)
Design
Calculate Computing Power of Individual Nodes fora specific application
![Page 18: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters](https://reader035.fdocuments.net/reader035/viewer/2022062514/55905bb61a28ab542e8b45f2/html5/thumbnails/18.jpg)
Design
● Evaluate Hadoop Distribution by running grep and wordcount together on all nodes
● Run the CRBalancer to balance the nodes● Finally re-run the applications to note the ramifications
of the data placement strategy.
![Page 19: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters](https://reader035.fdocuments.net/reader035/viewer/2022062514/55905bb61a28ab542e8b45f2/html5/thumbnails/19.jpg)
Design - Algorithm
CRBalancer Strategy
![Page 20: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters](https://reader035.fdocuments.net/reader035/viewer/2022062514/55905bb61a28ab542e8b45f2/html5/thumbnails/20.jpg)
Implementation
● CRBalancer ● CRBalancingPolicy● CRNamenodeConnector
![Page 21: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters](https://reader035.fdocuments.net/reader035/viewer/2022062514/55905bb61a28ab542e8b45f2/html5/thumbnails/21.jpg)
Results - WordCount
![Page 22: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters](https://reader035.fdocuments.net/reader035/viewer/2022062514/55905bb61a28ab542e8b45f2/html5/thumbnails/22.jpg)
Results - Grep
![Page 23: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters](https://reader035.fdocuments.net/reader035/viewer/2022062514/55905bb61a28ab542e8b45f2/html5/thumbnails/23.jpg)
Questions ??