hive HBase Metastore - Improving Hive with a Big Data Metadata Storage
Big data analytics -hive
-
Upload
karthika-karthi -
Category
Data & Analytics
-
view
2.904 -
download
7
Transcript of Big data analytics -hive
WDABT 2016 – BHARATHIAR UNIVERSITY
1Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
2Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
component of
3Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
Structure DataStructure Data
Large Data SetLarge Data Set
MapreduceMapreduce Parallel Distribution
Parallel Distribution
Query DataQuery Data
Why HIVE
4Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
Features of hive
5Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
HDFS or HBASE STORAGE SYSTEM
Execution Engine
Hive QL Process Engine
WEB UIWEB UIHIVE
COMMAND LINE
HIVE COMMAND
LINEHD InsightHD Insight
Meta Store
User Interface
HIVE Architecture
6Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
Embedded Metastore
Local Metastore Remote Metastore
7Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
Hive File formats
• Text Files - Delimited by Parameters• Sequence Files - Less Data• RC Files - Analytic Processing• ORC Files – Optimized file format in binary
format
8Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
Hive query language offers:
Create Database
Create ,manage and partition tables
Supports various operators like Relational, Arithmetic and
Logical to evaluate functions
Hive supports DDL and DML
HIVE Query Language (HQL)
9Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
DDL Data Definition Language) StatementsThe DDL commands are listed below
Create, Alter, Drop database
Create Alter, Drop, Truncate table
Create, Alter with Partitioning and Bucketing
Create Views
Show
Describe
10Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
Loading files
Inserting data into Hive Tables from queries
DML (Data Manipulation Language) Statements
11Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
Database Operations
Syntax
CREATE DATABASE IF NOT EXISTS db_name
COMMENT ‘db_name Details’
WITH DBPROPERTIES (‘creator’ = ‘name’);
Example
CREATE DATABASE IF NOT EXISTS LIBDETS
COMMENT ’LIBRARY DETAILS’
WITH DBPROPERTIES (‘creator’ = ‘KIRUTHI’);
12Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
Database OperationsSyntax
SHOW DATABASES // displays databases available
Example
SHOW DATABASES;
Syntax
DESCRIBE DATABASE db_name; //display Schema of database
DESCRIBE DATABASE EXTENDED db_name;
Example
DESCRIBE DATABASE LIBDETS;
DESCRIBE DATABASE EXTENDED LIBDETS13Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
ALTER Database
Syntax
ALTER DATABASE db_name // Alter database properties
SET DBPROPERTIES (‘edited-by’ = ‘name’);
Example
ALTER DATABASE LIBDETS
SET DBPROPERTIES (‘edited-by’ = ‘KANI’);
14Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
USE , DROP Database
Syntax
USE db_name; //Assign database as current working database
Example
USE LIBDETS;
Syntax
DROP DATABASE db_name; // delete database
Example
DROP DATABASE LIBDETS;
15Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
TABLES
Hive supports two types of tables
Managed Table – Table stored in HiveWarehouse folderExternal Table – Retains a schema copy in specified location even table is deleted
16Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
Creating Managed Table
SyntaxCREATE TABLE IF NOT EXISTS tb_name (column_name data_type, column_name datatype,column_name data type) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\t’ ;ExampleCREATE TABLE IF NOT EXISTS LIBTBL ( Member_Code INT,Membr_Name STRING, Designation STRING,Dept_code INT,dept_name STRING,group_name STRING,course_name STRING,title STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\t’ ;
Managed Table
17Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
External Table. Creating External Table
SyntaxCREATE EXTERNAL TABLE tb_name IF NOT EXISTS tb_name (column_name datatype, column_name datatype, column_name datatype) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’ LOCATION ‘ /home/usr/filename.format’; ExampleCREATE EXTERNAL TABLE IF NOT EXISTS LIBTBL (Member_Code INT, Member_Name STRING, Designation STRING, Dept_code INT, course_code INT, dept_name STRING, group_name STRING, course_name STRING, title STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’ LOCATION ‘/home/livrith/Desktop/Book2.csv’;
18Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
Loading Data into Table
SyntaxLOAD DATA LOCAL INPATH ‘hdfs_file_or_directory_path’ OVERWRITE INTO TABLE tb_name;
ExampleLOAD DATA LOCAL INPATH ‘/home/kiruthika/Documents/Book2.csv’ OVERWRITE INTO TABLE LIBTBL;
19Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
Select clauseSyntaxSELET [ALL | DISTINCT] select_expr, select_expr, . . .FROM tb_name [WHERE where_conditon][GROUP BY column_name][ORDER BY column_name][HAVING having_condition][DISTRIBUTED column_name][LIMIT number]; Example:1SELECT * FROM LIBTBL;Example:2SELECT Member Name, Designation FROM LIBTBL;
20Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
Select – whereExampleSELECT * FROM LIBUDET WHERE group_name = ‘TEACHING’ OR group_name = ‘student’ AND Dept_name>= ‘18’;
Select - regular expressionSyntaxSELECT column1,column2,column3 FROM tb_name WHERE column_name LIKE ‘%alp%’;
ExampleSELECT PRODUCT, STATE, CITY FROM SALESDETS WHERE City LIKE ‘%O%’;
21Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
Group by
Example
SELECT PRODUCT, COUNT(PRODUCT)AS C1, STATE,
COUNTRY FROM SALESDETS GROUP BY PRODUCT,
STATE;
Order by // Sorts use only one reducerExample
SELECT PRODUCT, STATE, PRICE, COUNTRY FROM
SALESDETS
ORDER BY COUNTRY;
22Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
Sort by // Sorts the data before given to reducer
ExampleSELECT PRODUC,STATE,COUNTRY FROM SALESDETS SORT BY COUNTRYLIMIT 10;
Having // Filter data based on Group By
ExampleSELECT PRODUCT, COUNT(PRODUCT) AS C1,STATE,COUNTRY FROM SALESDETS GROUP BY PRODUCT, STATE, COUNTRYHAVING C1 > 5;
23Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
Limit
ExampleSELECT PRODUCT,STATE, PRICE, COUNTRY FROM SALESDETS COUNTRY LIMIT 10;
Distribute by // distributes rows among reducers
SyntaxSELECT column_name1, column_name2,column_name3 FROM tb_name DISTRIBUTE BY column_name SORT BY column_name ASC,column_name ASC LIMIT count;
ExampleSELECT PRODUCT,PRICE,STATE FROM SALESDETS DISTRIBUTE BY STATE SORT BY STATE ASC, PRODUCT ASC LIMIT 50;
24Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
Cluster by // does the job of both distribute by and sort by
Example
SELECT PRODUCT,PRICE,STATE FROM SALESDETS
CLUSTER BY STATE LIMIT 50;
Difference in Execution of Order By , Sort By, Distribute By, Cluster By
25Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
Data Aggregation
COUNT
AVG DISTINCT (AVG)
MIN DISTINCT(MIN)
MAX , DISTINCT(MAX)
26Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
Partitions
Hive reads the entire dataset from warehouse even when filter
condition is specified to fetch a particular column. This results as
bottleneck in MapReduce jobs and involves huge degree of I/O.
Partition command is used to break larger dataset into small
chunks on columns.
Hive supports two types of partition
Static partition
Dynamic partition
27Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
Creating partition tableSyntaxCREATE TABLE tb_name (column1 datatype, column2 datatype,column3 datatype) COMMENT ‘Details of the dataset’ PARTITIONED BY (column_name STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’;
ExampleCREATE TABLE MY_TABLE1 (Member_Name STRING,dept_name STRING,group_name STRING,course_name STRING,title STRING) COMMENT ‘User information’ PARTITIONED BY (Designation STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’;
28Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
Load data into static partition table
Syntax
LOAD DATA LOCAL INPATH ‘file_path’ OVERWRITE
INTO TABLE tb_name;
Example
LOAD DATA LOCAL INPATH
‘/home/livrith/Desktop/mytab.csv’ OVERWRITE INTO
TABLE MY_TABLE2;
29Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
Set dynamic partition
The following setting has to be modified to execute dynamic partitions.SET hive.exec.dynamic.partition = true;SET hive.exec.dynamic.partition.mode = nonstrict;
ExampleSET hive.exec.dynamic.partition = true;SET hive.exec.dynamic.partition.mode = nonstrict;
30Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
Insert data - Dynamic partition table
SyntaxINSERT OVERWRITE TABLE 1st_tb_name PARTITION(column_name) SELECT column_name1,column_name2,column_name3 FROM 2nd_tb_name;
//partition field should be the last attribute when inserting data
ExampleINSERT OVERWRITE TABLE MY_TABLE1 PARTITION(Designation)SELECT Member_Name,dept_name,group_name,course_name,title,Designation FROM MY_TABLE2;
31Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
Bucketing
32Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
BucketingBucketing is similar to partitioning.
Bucket is a file.
Bucket are used to create partition on specified column values
where as partitioning is used to divided data into small blocks on
columns.
33Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
Table creationSyntaxCREATE TABLE IF NOT EXISTS tb_name (column1 datatype,column2 datatype,column3 datatype) CLUSTER BY(column_name) into 3 BUCKETSROW FORMAT DELIMITED FIELDS TERMINATED BY ‘/t’;
ExampleCREATE TABLE SALES_BUC1 (Transacyion_date TIMESTAMP,Product STRING,Price INT,Payment_Type STRING,Name STRING,City STRING,State STRING,Country STRING,Account_Created TIMESTAMP) CLUSTERED BY (Price) into 3 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’;
34Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
Load data into tableSyntax
FROM 1st_tb_name INSERT OVERWRITE TABLE
2nd_tb_name
SELECT column_name1, column_name2,column_name3;
Example
FROM SALESDETS INSERT OVERWRITE TABLE
SALES_BUC1 SELECT
Transaction_date,Product,Price,Payment_Type,Name,City,Sta
te,Country,Account_Created;
35Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
Select from bucket tableSyntax:1SELECT DISTINCT column_name FROM 2nd_tb_nametb_name (BUCKET 1 OUT OF 3 ON column_name);
ExampleSELECT DISTINCT Price FROM SALES_BUC1 TABLESAMPLE (BUCKET 1 OUT OF 3 ON PRICE);
Syntax:2SELECT DISTINCT column_name FROM tb_name2Tb_name(BUCKET 1 OUT OF 2 ON column_name);
ExampleSELECT DISTINCT PRICE FROM SALES_BUC1 TABLESAMPLE(BUCKET 1 OUT OF 2 ON Price);
36Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
Sampling•SAMPLING is used in hive to populate small dataset from
the existing large datasets. Sampling employs selects records
randomly to create small datasets.
SyntaxSELECT COUNT(*) FROM tb_name TABLESAMPLE (BUCKET 1 OUT OF 3 ON column_name);
ExampleIn the example given below sample are created from the table sales_buc from the available 3 buckets.SELECT COUNT(*) FROM SALES_BUC TABLESAMPLE (BUCKET 1 OUT OF 3 ON Price);
37Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
• Apache HBase is an open-source, distributed, versioned,
non-relational database modeled after Google's Bigtable
• Apache HBase provides Bigtable-like capabilities on top
of Hadoop and HDFS.
38Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
NoSQL Databases
• NoSQL – Not only SQL, Non Relational/Non SQL Databases
• SCHEMA LESS• Ideology • BASE – Basically available Eventual
Consistency - Only can support two availabilty, replication
39Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
NoSQL Types
• Key Value Store - Amazon S3, Riak• Document based store – CouchDB,MongoDB• Column based store - Hbase, Cassandra• Graph based stores - Neoj4, Orientdb
40Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
HBASE is Not
• Table with one primary key (row key)• No Join Operations• Limited Atomicty and transaction support• Manipulated by SQL
41Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
Hbase components
• Master - Manages load balancing and scripting• Regionserver – Range of tables assigned by masterZookeper –• Client communicate via Zookeeper for read write
operations in region servers for storing node details• Region server uses Memstore similar to cache
memory• Provides services for synchronization, maintenance
42Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
Refrences
• http://hadooptutorials.co.in/tutorials• https://www.youtube.com/watch?v=W_oUrDBLBaE• https://flume.apache.org/FlumeUserGuide.html• https://archive.cloudera.com/cdh/3/sqoop/SqoopUser
Guide.html#_basic_usage• http://hortonworks.com/hadoop/oozie/• http://www.01.ibm.com/software/data/infosphere/ha
doop/zookeeper/• https://www.youtube.com/watch?v=Dv2V7lbIRmI• http://kafka.apache.org/documentation.html• https://www.youtube.com/watch?v=ArUHr3Czx-8
43Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
44Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016