Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Hive 101: Hive Query Language
-
Upload
ryan-franks -
Category
Documents
-
view
173 -
download
4
description
Transcript of Hive 101: Hive Query Language
IN-0021
Hive 101: Hive Query Language2014-08-21
Jeff Clouse
® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.2
Agenda
• What is Hive• HUE• HQL
– Select– Operators– Functions– Joins– Sub Queries – Union
• Hive best practices
® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.3
What is Hive
• High level implementation of MapReduce
• Language is Hive Query Language - HQL
• HQL is a subset of ANSI SQL with extensions
• Metadata is stored in MySQL
• Semantics are very much like Oracle and MySQL
• There are no Updates
® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.4
What is Hive
• Hive tables
• External Tables
• Warehouse Tables
• Drops in HIVE External tables delete metadata
• Drops in the HIVE warehouse really delete
® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.5
HUE
• Hadoop User Experience• Provides web access to Hive
® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.7
HQL Select Syntax
• Select– Select * From t1
• Distinct– Select Distinct col1 From t1
• Where– Select * From t1 where col1 = ‘US’
• Limit– Select * From t1 limit 5
• Group By– Select col1, sum(col2) as Total From t1 group by col1
• Order By– Select col1, sum(col2) as Total From t1 group by col1 order by col1
• Having– Select col1, sum(col2) as Total From t1 group by col1 having sum(col2) > 50
® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.8
HQL Predicate Operators
• = Equals• <=> Equals or both sides are NULL• <>, != Not equal• < Less Than• <= Less than or equal to• > Greater than• >= Greater than or equal to• [not] between Value is equal to or between two values• is [not] NULL Check Value for NULL• like Value is like another value. Wildcards are %
and _
® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.9
HQL Arithmetic Operators
• A - B Subtract B from A• A * B Multiply A and B• A / B Divide A by B• A + B Add A and B• A % B The remainder resulting from A/B
• A & B Bitwise and of A and B• A | B Bitwise or of A and B • A ^ B Bitwise xor of A and B• ~A Bitwise negation of A
® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.10
HQL Logical Operators
• A and B, A && B Boolean and of A and B• A or B, A || B Boolean or of A and B• NOT A, !A Boolean negation of A• A [NOT] IN (B,…) A is in [or not] a set of values
® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.11
HQL Functions
• Round(A)• Round(A,2)• Floor(A)• Ceiling(A)• Rand()
• Year(date)• Month(date)• Datediff(date1, date2)• Date_add(startdate,
days)
• Length(A)• Upper(A)• Concat(A, B, …)• Substring(A, start ,len)• Trim(A)
• Sum(A)• Count(*)• Min(A)• Max(A)
® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.12
HQL Joins
• Join– Select * from table1 t1 join table2 t2 on t1.key = t2.key
• Only returns records from both tables
• Outer Joins– Left
• Select * from table1 t1 left join table2 t2 on t1.key = t2.key– Returns all rows from the left table, t1, and matching rows from the right table. Missing
rows from the right table will be populated with NULL
– Right• Select * from table1 t1 right join table2 t2 on t1.key = t2.key
– Returns all rows from the right table, t2, and matching rows from the left table. Missing rows from the left table will be populated with NULL
– Full • Select * from table1 t1 full outer join table2 t2 on t1.key = t2.key
– Returns all rows from both tables. Missing rows from either table will be populated with NULL
® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.13
HQL SubQueries and Union
• Used to combine multiple result sets• Only UNION ALL is supported currently• The number and name of columns returned by each select statement must
be the same.Select *from (
Select col1, col2from t1UNION ALLselect col1, col2from t2
) unionResults• Sub-queries are only supported in the from clause• Support for sub-queries in the where clause will be limited to IN and
EXISTS in Hive 0.13
® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.17
Hive best practices• Smallest to largest tables for joins• Data Layout
– Partition large tables– Use the partition in your where clause
® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.18
Partitioning – by Month
Jan Feb Dec
Trans
F0100
F0101
F0103
F0102
F0200
F0201
F0203
F0202
F1200
F1201
F1203
F1202
Table
Partitioned by Month
Files withinthe partitions
F0105
F0104 F1204
® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.19
Hive best practices• Smallest to largest tables for joins• Data Layout
– Partition large tables– Use the partition in your where clause– Bucketing
® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.20
Bucketing– by Basket_id
TransTables
Files containing Rows with same hash for Bucket_Id
Trans_item
® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.21
Hive best practices• Smallest to largest tables for joins• Data Layout
– Partition large tables– Use the partition in your where clause– Bucketing
• Data Sampling– Bucket TABLESAMPLE(bucket 30 out of 64 on basket_id)– Block TABLESAMPLE(1 PERCENT)
• Parallel Processing– set hive.exec.parallel=true;
® © 2014 Inmar, Inc. All Rights Reserved.
Questions?