Post on 16-Jul-2015
Slide 1© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Big Data Analytics using Hive
Slide 2© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Scope of PPT – BIG Data Analytics via Hive
ᗍ Introduction to Big Data and Hadoop
ᗍ Understanding Hive and its Concepts
ᗍ Hive Architecture, Hive Meta Store and Hive Use-Cases
ᗍ BIG Data Analytics via Hive
ᗍ BIG Data & Hadoop Job Trends
ᗍ Webinar Session by Skillspeed
Get Started with BIG Data & Hadoop
Slide 3© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Big Data and its Challenges
Get Started with BIG Data & Hadoop
Slide 4© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Big Data and its Challenges
Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications
Systems / Enterprises generate huge amount of data from Terabytes to and even Petabytes of information
It’s very difficult to manage such huge data……
Get Started with BIG Data & Hadoop
Slide 5© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Who Generates Big Data?
Have you ever wondered how Google, Facebook or LinkedIn manages to store and utilize the huge data?
Today, managing unstructured and voluminous data is creating a big problem.Get Started with BIG Data & Hadoop
Slide 6© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hadoop can be utilized for processing & analyzing large data-sets.
Before that let’s understand what is Hadoop?Get Started with BIG Data & Hadoop
Slide 7© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hadoop and its Characteristics
Apache Hadoop is a framework that allows the distributed processing of large data sets across clusters of commodity computers using a simple programming model
It is an Open-source Data Management technology with scale-out storage and distributed processing
Hadoop Characteristics
Flexible
Reliable
Economical
Scalable Get Started with BIG Data & Hadoop
Slide 8© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hadoop Ecosystem
Flume Sqoop
Import Or Export
Unstructured or Semi-Structured data Structured Data
Apache Oozie (Workflow)
HDFS(Hadoop Distributed File System)
Pig LatinData Analysis
HiveDW System
MapReduce Framework HBase
OtherYARN
Frameworks (MPI,GIRAPH)
YARNCluster Resource Management
Get Started with BIG Data & Hadoop
Slide 9© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hive Origination
ᗍ Hive originated as an internal project in Facebook
ᗍ Later it was adopted in Apache as an open source project
ᗍ Facebook deals with massive amount of data (petabytes scale) and it needs to perform more than 75k ad-hoc queries on this massive amount of data
ᗍ Since the data is collected from multiple servers and is of diverse nature, any RDBMS system could not fit as probable solution
ᗍ Map Reduce could be a natural choice, but it had its own limitations
Get Started with BIG Data & Hadoop
Slide 10© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
What is Hive?
ᗍ It is a query engine wrapper built on top of Map Reduce
ᗍ It is treated as Data Warehousing tool of Hadoop Ecosystem
ᗍ It is used for data analysis
ᗍ Primarily targeted to the users with SQL background
ᗍ Provides HiveQL, which is very similar to SQL
ᗍ It is used for managing and querying structured data
ᗍ Hadoop complexity is hidden from end users
ᗍ Java and Hadoop API knowledge is optional for core users
ᗍ Developed by Facebook and contributed to community
Get Started with BIG Data & Hadoop
Slide 11© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hive Use Cases
Ad-hoc analysis of underlying
data
Hypothesis testing of the
underlying data
Big Data Testing of huge data
sets
Analysis of the processed data
Get Started with BIG Data & Hadoop
Slide 12© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hive Components
Hive Components
Driver
Shell
Metastore
CompilerExecution
Engine
Get Started with BIG Data & Hadoop
Slide 13© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hive Architecture
JDBC/ODBCBrowse, Query, DDL
Metastore Thrift API
HIVE QL
ParserPlannerOptimizer
Execution
User-definedMapReduce Scripts
FileFormatsTextFile
SequenceFileRCFile
Map Reduce HDFS
UDF/UDAFSubstrSum
Average
SerDeCSV
ThriftRegex
Get Started with BIG Data & Hadoop
Slide 14© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hive Meta Store
Metastore
Derby
Metastore Metastore
MySQL
MetastoreServer JVM
MetastoreServer JVM
MySQL
Embedded Metastore Local Metastore Remote Metastore
HIVE Service JVM
DriverDriver Driver Driver Driver Driver
Get Started with BIG Data & Hadoop
Slide 15© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Job Trends – Hadoop
Get Started with BIG Data & Hadoop
Slide 16© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Why SkillSpeed?
Course Curriculum
from Industry Experts
Instructor Led Live Virtual
Sessions
Lifetime access to Course
Content via LMS
100% Placement Assistance
24x7 Support
Get Started with BIG Data & Hadoop
Slide 17© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Course Topics
Module 1
Introduction to Big Data and Hadoop
Module 2
HDFS Internals, Hadoop Configurations and
Data Loading
Module 3
Introduction to Map Reduce
Module 4
Advanced Map Reduce Concepts
Module 5
Introduction to Pig
Module 6
Advanced Pig and Introduction to Hive
Module 7
Advanced Hive Concepts
Module 8
Extending Hive and HBase Introduction
Module 9
Advanced HBase and Oozie Introduction
Module 10
Project Set-up Discussion
Get Started with BIG Data & Hadoop
Slide 18© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Corporate Partners
Get Started with BIG Data & Hadoop
Slide 19© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Lines open 24/7
To know more about the course, Please contact:
IND +91-90660-20904 USA 1866-607-6547 (Toll Free)
Or reach us at
sales@skillspeed.com
Contact Us
Get Started with BIG Data & Hadoop
Slide 20© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Image References
Google images – credit for google, Facebook and LinkedIn LOGO and Snapshots
http://iconizer.net/en/search/1/collection:Practika
http://findicons.com/icon/66444/user_group
http://www.virtualizor.com/tour
https://accounts.it.et.byu.edu/
http://www.clipartsfree.net/tag/server.html
http://www.gopixpic.com/16/time-clock-icon-png-download
http://blog.smartbear.com/requirements/how-to-interview-users-to-find-out-what-they-really-want/
http://www.lincs.fr/research/areas/big-data/
http://www.counsellingpages.co.uk/
http://langfordsconsultancy.com/langfords-training-support-package/
http://cbsepathshala.blogspot.in/2012/05/physics-class-x-chapter-electricity.html
http://mmatycoon.com/tycoontimes/tycoontimesstory.php?SID=1010