Hive 101: Hive Query Language

18
IN-0021 Hive 101: Hive Query Language 2014-08-21 Jeff Clouse

description

Hive 101: Hive Query Language. 2014-08-21. Jeff Clouse. Agenda. What is Hive HUE HQL Select Operators Functions Joins Sub Queries Union Hive best practices. What is Hive. High level implementation of MapReduce Language is Hive Query Language - HQL - PowerPoint PPT Presentation

Transcript of Hive 101: Hive Query Language

Page 1: Hive  101: Hive Query Language

IN-0021

Hive 101: Hive Query Language2014-08-21

Jeff Clouse

Page 2: Hive  101: Hive Query Language

® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.2

Agenda

• What is Hive• HUE• HQL

– Select– Operators– Functions– Joins– Sub Queries – Union

• Hive best practices

Page 3: Hive  101: Hive Query Language

® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.3

What is Hive

• High level implementation of MapReduce

• Language is Hive Query Language - HQL

• HQL is a subset of ANSI SQL with extensions

• Metadata is stored in MySQL

• Semantics are very much like Oracle and MySQL

• There are no Updates

Page 4: Hive  101: Hive Query Language

® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.4

What is Hive

• Hive tables

• External Tables

• Warehouse Tables

• Drops in HIVE External tables delete metadata

• Drops in the HIVE warehouse really delete

Page 5: Hive  101: Hive Query Language

® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.5

HUE

• Hadoop User Experience• Provides web access to Hive

Page 6: Hive  101: Hive Query Language

® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.7

HQL Select Syntax

• Select– Select * From t1

• Distinct– Select Distinct col1 From t1

• Where– Select * From t1 where col1 = ‘US’

• Limit– Select * From t1 limit 5

• Group By– Select col1, sum(col2) as Total From t1 group by col1

• Order By– Select col1, sum(col2) as Total From t1 group by col1 order by col1

• Having– Select col1, sum(col2) as Total From t1 group by col1 having sum(col2) > 50

Page 7: Hive  101: Hive Query Language

® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.8

HQL Predicate Operators

• = Equals• <=> Equals or both sides are NULL• <>, != Not equal• < Less Than• <= Less than or equal to• > Greater than• >= Greater than or equal to• [not] between Value is equal to or between two values• is [not] NULL Check Value for NULL• like Value is like another value. Wildcards are %

and _

Page 8: Hive  101: Hive Query Language

® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.9

HQL Arithmetic Operators

• A - B Subtract B from A• A * B Multiply A and B• A / B Divide A by B• A + B Add A and B• A % B The remainder resulting from A/B

• A & B Bitwise and of A and B• A | B Bitwise or of A and B • A ^ B Bitwise xor of A and B• ~A Bitwise negation of A

Page 9: Hive  101: Hive Query Language

® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.10

HQL Logical Operators

• A and B, A && B Boolean and of A and B• A or B, A || B Boolean or of A and B• NOT A, !A Boolean negation of A• A [NOT] IN (B,…) A is in [or not] a set of values

Page 10: Hive  101: Hive Query Language

® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.11

HQL Functions

• Round(A)• Round(A,2)• Floor(A)• Ceiling(A)• Rand()

• Year(date)• Month(date)• Datediff(date1, date2)• Date_add(startdate,

days)

• Length(A)• Upper(A)• Concat(A, B, …)• Substring(A, start ,len)• Trim(A)

• Sum(A)• Count(*)• Min(A)• Max(A)

Page 11: Hive  101: Hive Query Language

® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.12

HQL Joins

• Join– Select * from table1 t1 join table2 t2 on t1.key = t2.key

• Only returns records from both tables

• Outer Joins– Left

• Select * from table1 t1 left join table2 t2 on t1.key = t2.key– Returns all rows from the left table, t1, and matching rows from the right table. Missing

rows from the right table will be populated with NULL

– Right• Select * from table1 t1 right join table2 t2 on t1.key = t2.key

– Returns all rows from the right table, t2, and matching rows from the left table. Missing rows from the left table will be populated with NULL

– Full • Select * from table1 t1 full outer join table2 t2 on t1.key = t2.key

– Returns all rows from both tables. Missing rows from either table will be populated with NULL

Page 12: Hive  101: Hive Query Language

® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.13

HQL SubQueries and Union

• Used to combine multiple result sets• Only UNION ALL is supported currently• The number and name of columns returned by each select statement must

be the same.Select *from (

Select col1, col2from t1UNION ALLselect col1, col2from t2

) unionResults• Sub-queries are only supported in the from clause• Support for sub-queries in the where clause will be limited to IN and

EXISTS in Hive 0.13

Page 13: Hive  101: Hive Query Language

® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.17

Hive best practices• Smallest to largest tables for joins• Data Layout

– Partition large tables– Use the partition in your where clause

Page 14: Hive  101: Hive Query Language

® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.18

Partitioning – by Month

Jan Feb Dec

Trans

F0100

F0101

F0103

F0102

F0200

F0201

F0203

F0202

F1200

F1201

F1203

F1202

Table

Partitioned by Month

Files withinthe partitions

F0105

F0104 F1204

Page 15: Hive  101: Hive Query Language

® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.19

Hive best practices• Smallest to largest tables for joins• Data Layout

– Partition large tables– Use the partition in your where clause– Bucketing

Page 16: Hive  101: Hive Query Language

® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.20

Bucketing– by Basket_id

TransTables

Files containing Rows with same hash for Bucket_Id

Trans_item

Page 17: Hive  101: Hive Query Language

® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.21

Hive best practices• Smallest to largest tables for joins• Data Layout

– Partition large tables– Use the partition in your where clause– Bucketing

• Data Sampling– Bucket TABLESAMPLE(bucket 30 out of 64 on basket_id)– Block TABLESAMPLE(1 PERCENT)

• Parallel Processing– set hive.exec.parallel=true;

Page 18: Hive  101: Hive Query Language

® © 2014 Inmar, Inc. All Rights Reserved.

Questions?