Scheduling scheme for hadoop clusters
-
Upload
amjith-singh -
Category
Technology
-
view
319 -
download
0
description
Transcript of Scheduling scheme for hadoop clusters
A RESEARCH ON SCHEDULING SCHEME FOR HADOOP CLUSTERS
Guided by Presented by
Neetha K N Amjith B
Dept of CSE S7 CSE
AREAS OF SEMINAR
Hadoop
MapReduce and HDFS
Node 1
Node 2
Node n
.
.
.
Rack 1
Node 1
Node 2
Node n
Rack 2
. . .
Node 1
Node 2
Node n
Rack n
Hadoop clusterTERMINOLOGY REVIEW
INTRODUCTION
• Hadoop is a Open source software framework for distributed processing of large datasets across large clusters of computers
• 2 ComponentsMapReduce engineDistributed file system
COMPONENTS
• Mapreduce engineProgramming model developed by Google Computation component of Hadoop Consists of Map and Reduce functions
• HDFS Storage component of Hadoop Splits the data into blocks and distributes themFault tolerant and self-healing
• Jobtracker•Tasktracker
MapReduce node
•Name node•Data node
HDFS node
• HDFS Node• NameNode – Maintains metadata information
about files (1 per cluster). • DataNode – Handles all data allocation and
replication and is installed on each slave node (1 to many per cluster).
• MapReduce node• JobTracker – Schedules job execution and keep
track of cluster wide job status (1 per cluster) • TaskTracker – Receives tasks from job tracker.
Runs on compute nodes in conjunction with data node (1 to many per cluster).
LITERATURE SURVEY
SYSTEM FEATURES DISADVANTAGES
REFERENCE
Hadoop FIFO scheduing
Implements by FIFO principle
Can not assign priority for jobs
REF [6]
Facebook’s Fair scheduler
Even allocation of resources
No preemption support for large tasks
REF [4]
Yahoo’s Capacity scheduler
FIFO scheduler based on priority
Problem in assigning priorities
REF[6]
EXISTING SYSTEM
EXISTING SYSTEM (disadvantage)
• The underutilization of CPU processes• Not flexible• Interaction between master node with slave nodes
PROPSED SYSTEM
• Analyze the system for CPU and IO underutilization• Use a predictive scheduler for predicting the appropriate
TaskTracker• Couple the scheduler with a prefetching mechanism to
improve the system performance
PREDICTIVE SCHEDULER
• Flexible task scheduler• Predicts the most appropriate task trackers to assign
future tasks• Allows DataNodes to explore underutilization of disk
bandwidth• Seeks stragglers and predicts candidate data blocks
PREFETCHING MODULE
• Integrate with predictive scheduler• Multiple worker threads• Monitor status of worker threads and coordinate
prefetching process
STEPS FOR LAUNCHING TASKS
Copying the job from HDFS to TaskTracker
Creation of local working directory for task
Creation of TaskTracker instance
ISSUES IN PREFETCHING MODULE
•When to prefetch•What to prefetch• How much to prefetch
ADVANTAGES
• Avoidance of I/O stalls• Maximising CPU utilisation • Helps the smooth functioning of Hadoop• Flexible
COMPARISON
EXISTING SYSTEM PROPSOED SYSTEM
Low i/o perfomance High I/O perfomance
CPU underutilised Proper utilisation
Less flexible Additional overhead of prefetching to master
FUTURE SCOPE
• Hadoop on demand (HOD)• A scheduler in heterogeneous environment
REFERENCES
• 1. J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. OSDI ’04, pages 137–150, 2008.
• 2. M.Zaharia, A.Konwinski, A.Joseph, Y.zatz, and I.Stoica. Improving mapreduce performance in heterogeneous environments. In OSDI’08: 8th USENIX Symposium on Operating Systems Design and Implementation, October 2008.
• 3. R. H. Patterson, G. A. Gibson, E. Ginting, D. Stodolsky, and J. Zelenka. Informed prefetching and caching. SIGOPS Oper. Syst. Rev., 29:79–95, December 1995.
• 4. Sangwon Seo, Ingook Jang, Kyungchang Woo, Inkyo Kim,et. al. Hpmr: Prefetching and pre-shuffling in shared mapreduce computation environment. In Proceedings of 11th IEEE International Conference on Cluster Computing, pages 16–20. ACM, 2009.
• 5. Tom White. Hadoop The Definitive Guide. O’Reilly, 2009.• 6. Towards a Resource Aware Scheduler in Hadoop Mark Yong, Nitin
Garegrat, Shiwali Mohan
THANK YOU!!!!!!
QUESTIONS??