Tips from Hadoop experts for beginners

14
Getting started with HADOOP? Tips from Hadoop Professionals to help kick start your career

description

9 Hadoop expert tips for Hadoop Learners

Transcript of Tips from Hadoop experts for beginners

Page 1: Tips from Hadoop experts for beginners

Getting started with HADOOP? Tips from Hadoop Professionals to

help kick start your career

Page 2: Tips from Hadoop experts for beginners

“I would like to share my experience with you。

1. I think practice is more important than theory, so do a quick start like use Cloudera QuickStart VM。

2. Starting with the basics of installing and configuring Hadoop Using command line, when you are familiar with it, you can use GUI like ambari or cloudera manager。”

Jin ZhanSquare Enix - Senior EngineerJapan

Page 3: Tips from Hadoop experts for beginners

“Here are some tips - these are based on things which people should know but I have seen them get wrong - you probably have them already - and there are more than two!

1. You must increase ulimits

http://blog.cloudera.com/blog/2009/03/configuration-parameters-what-can-you-just-ignore/

Mark H. ButlerSoftware Engineer at Pataniqa LtdPreston, United Kingdom

Page 4: Tips from Hadoop experts for beginners

2. Installing a NoSQL database? Use the YCSB benchmark to check it is working correctly

https://github.com/brianfrankcooper/YCSB/wiki

Mark H. ButlerSoftware Engineer at Pataniqa LtdPreston, United Kingdom

Page 5: Tips from Hadoop experts for beginners

3. Consider using compression (although there are tradeoffs!)

http://comphadoop.weebly.com/http://blog.erdemagaoglu.com/post/4605524309/lzo-vs-snappy-vs-lzf-vs-zlib-a-comparison-of

http://www.slideshare.net/Hadoop_Summit/kamat-singh-june27425pmroom210cv2

http://blog.cloudera.com/blog/2011/09/snappy-and-hadoop/http://www.cloudera.com/blog/2009/11/17/hadoop-at-twitter-part-1-splittable-lzo-compression/

https://github.com/twitter/hadoop-lzo

Mark H. ButlerSoftware Engineer at Pataniqa LtdPreston, United Kingdom

Page 6: Tips from Hadoop experts for beginners

4. Don't install a Hadoop cluster manually - but there are many technologies to automate e.g. Puppet, Chef, Ansible, Vagrant

http://blog.godatadriven.com/bare-metal-hadoop-provisioning-ansible-cobbler.html

http://chimpler.wordpress.com/2013/01/20/deploying-hadoop-on-ec2-with-whirr/

http://java.dzone.com/articles/setting-hadoop-virtual-clusterhttp://www.diversit.eu/2012/05/setting-up-hadoop-cluster-using-puppet.html

http://www.rpark.com/2013/02/using-chef-to-build-out-hadoop-cluster.html

Mark H. ButlerSoftware Engineer at Pataniqa LtdPreston, United Kingdom

Page 7: Tips from Hadoop experts for beginners

5. Java and Scala are great but don't overlook Python - it's handy for prototyping one-off map-reduce jobs as you do not need a cluster to test

http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/

Hope that helps! “

Mark H. ButlerSoftware Engineer at Pataniqa LtdPreston, United Kingdom

Page 8: Tips from Hadoop experts for beginners

“Technically speaking, Map Reduce is the base and Map = Select and Reduce = Group by so if you know what you want and how you want to summarize it then Hadoop is meant for you. “

Piyush JindalSoftware Engineer at TargetBengaluru, Karnataka, India

Page 9: Tips from Hadoop experts for beginners

“Tips :

1. Good knowledge of Data Structure and Insight to Analyze the data is a Must.2. Core JAVA and COLLECTION is must. 3. SQL and PL/SQL knowledge to solve complex scenarios will help a lot.

These are the stepping stones to approach a problem in Bigdata and to provide solution as well.. “

SOMANATH NANDACloudera Certified Developer for HadoopCognizant Technology SolutionsBengaluru, Karnataka, India

Page 10: Tips from Hadoop experts for beginners

“1. Audit your data to identify what might be useful but unexploited. 2. Study new technologies; they are moving rapidly.”

Merv AdrianVice President at GartnerSan Francisco Bay Area

Page 11: Tips from Hadoop experts for beginners

“some good examples in this whitepaper (note, registration required): http://www.mongodb.com/lp/big-data

Mat KeepPrincipal Product Marketing Manager at MongoDB Inc.Hawkinge, Kent, United Kingdom

Page 12: Tips from Hadoop experts for beginners

“Here are some tips in no specific order

1. Best value of Hadoop comes from the combination of software and hardware designed for your specific needs.

2. Hardware configuration of your cluster is very important . If you work load is I/O bound then disk specs are important, if CPU bound then faster CPUs are better and if application is memory bound then server with larger memory are needed.

Mohit SaxenaVice President -Technology Founder InMobi - A Global Mobile Ad NetworkBengaluru Area, India

Page 13: Tips from Hadoop experts for beginners

3. Network connectivity between nodes is extremely important at least 1 gigabit NIC are must in Hadoop cluster so that inter communication aren't a bottleneck in your cluster as they can be huge drag.

4. Plan the size of storage and disk controller as per your need ofread per sec that you want to achieve from each server.

5. Ganglia is a fairly good monitoring tool for Hadoop and it canpoint out bottlenecks .”

Mohit SaxenaVice President -Technology Founder InMobi - A Global Mobile Ad NetworkBengaluru Area, India

Page 14: Tips from Hadoop experts for beginners

For more information on best Hadoop courses for your career

Check out the link belowhttp://www.dezyre.com/Big-Data-and-Hadoop/19