Big data with hadoop Setup on Ubuntu 12.04

33
Mandakini Kumari Big Data With Hadoop Setup

description

Big data- Setup and Configure Hadoop on Ubuntu 12.04

Transcript of Big data with hadoop Setup on Ubuntu 12.04

Page 1: Big data with hadoop Setup on Ubuntu 12.04

Mandakini Kumari

Big DataWith

Hadoop Setup

Page 2: Big data with hadoop Setup on Ubuntu 12.04

Agenda1. Big Data ?2. Limitation of Existing System3. Advantage Of Hadoop4. Disadvantage of Hadoop5. Hadoop Echo System & Components6. Prerequisite for Hadoop 1.x7. Install Hadoop 1.X

Page 3: Big data with hadoop Setup on Ubuntu 12.04

1.1 Characteristics of Big Data

Page 4: Big data with hadoop Setup on Ubuntu 12.04

1.2 In Every 60 seconds on the internet

Page 5: Big data with hadoop Setup on Ubuntu 12.04

2.1 Limitation of Existing Data Analytics Architecture

Page 6: Big data with hadoop Setup on Ubuntu 12.04

3.1 Advantage of Hadoop•Hadoop: storage + Computational capabilities both together. While RDBMS computation done in CPU which required BUS for data transfer from HardDisk to CPU

•Fault-tolerant hardware is expensive V/S Hadoop is design to run on cheap commodity hardware•Complicated Data Replication & Failure System v/s Hadoop autometically handles datareplication and node failure.

•HDFS (storage) is optimized for high throughput.•Large block sizes of HDFS helps in large files(GB, PB...)• HDFS have high Scalability and Availability for achieve data replication and fault tolerance.•Extremely scalable•MR Framework allows parallel work over a huge data.•Job schedule for remote execution on the slave/datanodes allow parallel & fast job executions.•MR deal with business and HDFS with storage independently

Page 7: Big data with hadoop Setup on Ubuntu 12.04

3.2 Advantage of Hadoop

Page 8: Big data with hadoop Setup on Ubuntu 12.04

3.3 Advantage of Hadoop

Page 9: Big data with hadoop Setup on Ubuntu 12.04

4.1 Disadvantage of Hadoop•HDFS is inefficient for handling small files

•Hadoop 1.X single points of failure at NN

•Create problem if cluster is more then 4000 because all meta data will store on only one NN RAM.•Hadoop 2.x don't have single points of failure.

•Security is major concern because Hadoop 1.X does offer a security model But by default it is disabled because of its high complexity.

•Hadoop 1.X does not offer storage or network level encryption which is very big concern for government sector application data.

Page 10: Big data with hadoop Setup on Ubuntu 12.04

5.1 HADOOP ECO SYSTEM

Page 11: Big data with hadoop Setup on Ubuntu 12.04

5.2 ADVANTAGE OF HDFS

Page 12: Big data with hadoop Setup on Ubuntu 12.04

5.3 NAMENODE: HADOOP COMPONENT

•It is Master with high end H/W.•Store all Metadata in Main Memory i.e. RAM.

•Type of MetaData: List of files, Blocks for each file, DN for each block

•File attributes: Access time, replication factor

•JobTracker report to NN after JOB completed.

•Receive heartbeat from each DN

•Transaction Log: Records file create / delete etc.

Page 13: Big data with hadoop Setup on Ubuntu 12.04

5.4 DATANODE: HADOOP COMPONENT•A Slave/commodity H/W

•File Write operation in DN preferred as sequential process. If parallel then issue in data replication.

•File write in DN is parallel process

•Provides actual storage.

•Responsible for read/write data for clients

•Heartbeat: NN receive heartbeat from DN in every 5 or 10 sec. If heartbeat not received then data will replicated to another datanode.

Page 14: Big data with hadoop Setup on Ubuntu 12.04

5.5 SECONDARY NAMENODE: HADOOP COMPONENT

•Not a hot standby for the NameNode(NN)

•If NN fail only Read operation can performed no block replicated or deleted.

•If NN failed system will go in safe mode

•Secondary NameNode connect to NN in every hour and get backup of NN metadata

•Saved metadata can build a failed NameNode

Page 15: Big data with hadoop Setup on Ubuntu 12.04

5.6 MAPREDUCE(BUSINESS LOGIC) ENGINE•TaskTracker(TT) is slave•TT act like resource who work on task •Jobtracker(Master) act like manager who split JOB into TASK

Page 16: Big data with hadoop Setup on Ubuntu 12.04

5.7 HDFS: HADOOP COMPONENT

Page 17: Big data with hadoop Setup on Ubuntu 12.04

5.8 FAULT TOLERANCE: REPLICATION AND RACK AWARENESS

Page 18: Big data with hadoop Setup on Ubuntu 12.04

6. Hadoop Installation: Prerequisites1. Ubuntu Linux 12.04.3 LTS

2. Installing Java v1.5+

3. Adding dedicated Hadoop system user.

4. Configuring SSH access.

5. Disabling IPv6.

For Putty user: sudo apt-get install openssh-serverRun command sudo apt-get update

Page 19: Big data with hadoop Setup on Ubuntu 12.04

6.1 Install Java v1.5+

6.1.1) Download latest oracle java linux version wget https://edelivery.oracle.com/otn-pub/java/jdk/7u45-b18/jdk-7u45-linux-x64.tar.gzORTo avoid passing username and password usewget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com" https://edelivery.oracle.com/otn-pub/java/jdk/7u45-b18/jdk-7u45-linux-x64.tar.gz

6.1.2) Copy Java binaries into the /usr/local/java directory.sudo cp -r jdk-7u25-linux-x64.tar.gz /usr/local/java

6.1.3) Change the directory to /usr/local/java: cd /usr/local/java

6.1.4) Unpack the Java binaries, in /usr/local/javasudo tar xvzf jdk-7u25-linux-x64.tar.gz

6.1.5) Edit the system PATH file /etc/profile sudo nano /etc/profile or sudo gedit /etc/profile

Page 20: Big data with hadoop Setup on Ubuntu 12.04

6.1 Install Java v1.5+

6.1.6) At end of /etc/profile file add the following system variables to your system path:JAVA_HOME=/usr/local/java/jdk1.7.0_25PATH=$PATH:$HOME/bin:$JAVA_HOME/binexport JAVA_HOMEexport PATH

6.1.7)Inform your Ubuntu Linux system where your Oracle Java JDK/JRE is located. sudo update-alternatives --install "/usr/bin/javac" "javac" "/usr/local/java/jdk1.7.0_40/bin/javac"

6.1.8) Reload system wide PATH /etc/profile: . /etc/profile

6.1.9) Test Java: Java -version

Page 21: Big data with hadoop Setup on Ubuntu 12.04

6.2 Add dedicated Hadoop system user

6.2.1) Adding group: sudo addgroup Hadoop

6.2.2) Creating a user and adding the user to a group:sudo adduser –ingroup Hadoop hduser

Page 22: Big data with hadoop Setup on Ubuntu 12.04

6.3 Generae an SSH key for the hduser user

6.3.1) Login as hduser with sudo

6.3.2) Run this Key generation command: ssh-keyegen -t rsa -P “”

6.3.3) It will ask to provide the file name in which to save the key, just press has entered so that it will generate the key at ‘/home/hduser/ .ssh’

6.3.4)Enable SSH access to your local machine with this newly created key.cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

6.3.5) Test SSH setup by connecting to your local machine with the hduser user. ssh hduser@localhostThis will add localhost permanently to the list of known hosts

Page 23: Big data with hadoop Setup on Ubuntu 12.04

6.4 Disabling IPv6

6.4.1)We need to disable IPv6 because Ubuntu is using 0.0.0.0 IP for different Hadoop configurations. Run command : sudo gedit /etc/sysctl.conf

Add the following lines to the end of the file and reboot the machine, to update the configurations correctly.

#disable ipv6net.ipv6.conf.all.disable_ipv6 = 1net.ipv6.conf.default.disable_ipv6 = 1net.ipv6.conf.lo.disable_ipv6 = 1

Page 24: Big data with hadoop Setup on Ubuntu 12.04

Install Hadoop 1.2Ubuntu Linux 12.04.3 LTSHadoop 1.2.1, released August, 2013

Download and extract Hadoop:

Command: wget http://archive.apache.org/dist/hadoop/core/hadoop-1.2.0/hadoop-1.2.0.tar.gz

Command: tar -xvf hadoop-1.2.0.tar.gz

Page 25: Big data with hadoop Setup on Ubuntu 12.04

Edit Core-Site.Xml

Command: sudo gedit hadoop/conf/core-site.xml

<property><name>fs.default.name</name><value>hdfs://localhost:8020</value></property>

Page 26: Big data with hadoop Setup on Ubuntu 12.04

Edit hdfs-site.xml

Command: sudo gedit hadoop/conf/hdfs-site.xml

<property><name>dfs.replication</name><value>1</value></property><property><name>dfs.permissions</name><value>false</value></property>

Page 27: Big data with hadoop Setup on Ubuntu 12.04

Edit mapred-site.xml

Command: sudo gedit hadoop/conf/mapred -site.xml

<property><name>mapred.job.tracker</name><value>localhost:8021</value></property>

Page 28: Big data with hadoop Setup on Ubuntu 12.04

Get your ip address

Command: ifconfig

Command: sudo gedit /etc/hosts

Page 29: Big data with hadoop Setup on Ubuntu 12.04

CREATE A SSH KEY•Command: ssh-keygen -t rsa –P ""•Moving the key to authorized key:•Command: cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

Page 30: Big data with hadoop Setup on Ubuntu 12.04

Configuration

•Reboot the system

• Add JAVA_HOME in hadoop-env.sh file:

Command: sudo gedit hadoop/conf/hadoop-env.sh

Type :export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-i386

Page 31: Big data with hadoop Setup on Ubuntu 12.04

JAVA_HOME

Page 32: Big data with hadoop Setup on Ubuntu 12.04

Hadoop CommandFormat the name nodeCommand: bin/hadoop namenode -formatStart the namenode, datanodeCommand: bin/start-dfs.shStart the task tracker and job trackerCommand: bin/start-mapred.shTo check if Hadoop started correctlyCommand: jps

Page 33: Big data with hadoop Setup on Ubuntu 12.04

Thank you

CONTACT ME @http://in.linkedin.com/pub/mandakini-kumari/18/93/935http://www.slideshare.net/mandakinikumari

References:http://bigdatahandler.com/2013/10/24/what-is-apache-hadoop/edureka.in