Page 1 Integration of Apache Hive and...

38
© Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis Soztutar enis [at] apache [dot] org @enissoz Page 1 Architecting the Future of Big Data

Transcript of Page 1 Integration of Apache Hive and...

Page 1: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Inte

gra

tio

n o

f A

pa

ch

e H

ive

an

d H

Ba

se

En

is S

oztu

tar

en

is [a

t] a

pa

ch

e [d

ot] o

rg

@e

nis

so

z

Page 1

Arc

hitecting the F

utu

re o

f B

ig D

ata

Page 2: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Ag

en

da

Page 3

A

rchitecting the F

utu

re o

f B

ig D

ata

• O

ve

rvie

w o

f H

ive

an

d H

Ba

se

• H

ive

+ H

Ba

se

Fe

atu

res a

nd

Im

pro

ve

me

nts

• F

utu

re o

f H

ive

an

d H

Ba

se

• Q

&A

Page 3: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Ap

ach

e H

ive

Ove

rvie

w

• A

pa

ch

e H

ive

is a

da

ta w

are

ho

use

syste

m fo

r H

ad

oo

p

• S

QL

-lik

e q

ue

ry la

ng

ua

ge

ca

lled

Hiv

eQ

L

• B

uilt

fo

r P

B s

ca

le d

ata

• M

ain

pu

rpo

se

is a

na

lysis

an

d a

d h

oc q

ue

ryin

g

• D

ata

ba

se

/ ta

ble

/ p

art

itio

n / b

ucke

t –

DD

L O

pe

ratio

ns

• S

QL T

yp

es +

Co

mp

lex T

yp

es (

AR

RA

Y, M

AP, e

tc)

• V

ery

exte

nsib

le

• N

ot fo

r : sm

all

da

ta s

ets

, lo

w la

ten

cy q

ue

rie

s, O

LT

P

Page 4

A

rch

itecting the F

utu

re o

f B

ig D

ata

Page 4: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Ap

ach

e H

ive

Arc

hite

ctu

re

Page 5

A

rchitecting the F

utu

re o

f B

ig D

ata

Me

tasto

re

RD

BM

S

Hiv

e T

hrift

Se

rve

r

Dri

ve

r

CL

I

JD

BC

/OD

BC

Hiv

e W

eb

Inte

rfa

ce

HD

FS

Ma

pR

ed

uce

Exe

cu

tio

n

Pa

rse

r P

lan

ne

r

Rd

HD

FS

du

ce

d

Op

tim

ize

r

M S C l i e n t

Page 5: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Ove

rvie

w o

f A

pa

ch

e H

Ba

se

• A

pa

ch

e H

Ba

se

is th

e H

ad

oo

p d

ata

ba

se

• M

od

ele

d a

fte

r G

oo

gle

’s B

igTa

ble

• A

sp

ars

e, d

istr

ibu

ted

, p

ers

iste

nt m

ulti- d

ime

nsio

na

l so

rte

d

ma

p

• T

he

ma

p is in

de

xe

d b

y a

ro

w k

ey,

co

lum

n k

ey,

an

d a

tim

esta

mp

• E

ach

va

lue

in

th

e m

ap

is a

n u

n-in

terp

rete

d a

rra

y o

f b

yte

s

• L

ow

la

ten

cy r

an

do

m d

ata

acce

ss

Page 6

A

rchitecting the F

utu

re o

f B

ig D

ata

Page 6: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Ove

rvie

w o

f A

pa

ch

e H

Ba

se

• L

og

ica

l vie

w:

Page 7

A

rchitecting the F

utu

re o

f B

ig D

ata

Fro

m:

Big

table

: A

Dis

trib

ute

d S

tora

ge

Syste

m fo

r S

tru

ctu

red

Da

ta, C

ha

ng, e

t a

l.

Page 7: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Ap

ach

e H

Ba

se A

rch

ite

ctu

re

Page 8

A

rch

itecting the F

utu

re o

f B

ig D

ata

Clie

nt

Zo

oke

ep

er

HM

aste

r

Re

gio

n

se

rve

r

Re

gio

n

Re

gio

n

Re

gio

n

se

rve

r

Re

gio

n

Re

gio

n

Re

gio

n

se

rve

r

Re

gio

n

Re

gio

n

H

HD

FS

Page 8: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Hiv

e +

HB

ase

Fe

atu

res a

nd

Imp

rove

me

nts

Arc

hitecting the F

utu

re o

f B

ig D

ata

P

age 9

Page 9: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Hiv

e +

HB

ase

Mo

tiva

tio

n

• H

ive

an

d H

Ba

se

ha

s d

iffe

ren

t ch

ara

cte

ristics:

• H

ive

da

taw

are

ho

use

s o

n H

ad

oo

p a

re h

igh

la

ten

cy

– L

on

g E

TL tim

es

– A

cce

ss to

re

al tim

e d

ata

• A

na

lyzin

g H

Ba

se d

ata

with M

ap

Re

du

ce r

eq

uire

s c

usto

m

co

din

g

• H

ive

an

d S

QL a

re a

lre

ad

y k

no

wn

by m

an

y a

na

lysts

Page 1

0

Arc

hitecting the F

utu

re o

f B

ig D

ata

Hig

h la

ten

cy

vs.

Lo

w la

ten

cy

Str

uctu

red

U

nstr

uctu

red

An

aly

sts

P

rog

ram

me

rs

Page 10: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Use

Ca

se

1: H

Ba

se

as E

TL D

ata

Sin

k

Page 1

1

Arc

hitecting the F

utu

re o

f B

ig D

ata

Fro

m H

UG

- H

ive

/HB

ase In

teg

ratio

n o

r, M

ayb

eS

QL?

April 2

01

0 J

oh

n S

ich

i F

ace

bo

ok

htt

p:/

/ww

w.s

lide

sh

are

.ne

t/h

ad

oo

pu

se

rgro

up/h

ive

-h-b

ase

ha

do

op

ap

r20

10

Page 11: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Use

Ca

se

2: H

Ba

se

as D

ata

So

urc

e

Page 1

2

Arc

hitecting the F

utu

re o

f B

ig D

ata

Fro

m H

UG

- H

ive

/HB

ase In

teg

ratio

n o

r, M

ayb

eS

QL?

April 2

01

0 J

oh

n S

ich

i F

ace

bo

ok

htt

p:/

/ww

w.s

lide

sh

are

.ne

t/h

ad

oo

pu

se

rgro

up/h

ive

-h-b

ase

ha

do

op

ap

r20

10

Page 12: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Use

Ca

se

3: L

ow

La

ten

cy W

are

ho

use

Page 1

3

Arc

hitecting the F

utu

re o

f B

ig D

ata

Fro

m H

UG

- H

ive

/HB

ase In

teg

ratio

n o

r, M

ayb

eS

QL?

April 2

01

0 J

oh

n S

ich

i F

ace

bo

ok

htt

p:/

/ww

w.s

lide

sh

are

.ne

t/h

ad

oo

pu

se

rgro

up/h

ive

-h-b

ase

ha

do

op

ap

r20

10

Page 13: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Exa

mp

le: H

ive

+ H

ba

se (

HB

ase

ta

ble

)

hbase(main):001:0> create 'short_urls', {NAME =>

'u'}, {NAME=>'s'}

hbase(main):014:0> scan 'short_urls'

ROW COLUMN+CELL

bit.ly/aaaa column=s:hits, value=100

bit.ly/aaaa column=u:url,

value=hbase.apache.org/

bit.ly/abcd column=s:hits, value=123

bit.ly/abcd column=u:url,

value=example.com/foo

P

age 1

4

Arc

hitecting the F

utu

re o

f B

ig D

ata

Page 14: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Exa

mp

le: H

ive

+ H

Ba

se

(H

ive

ta

ble

)

CREATE TABLE short_urls(

short_url string,

url string,

hit_count int

)

STORED BY

'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

WITH SERDEPROPERTIES

("hbase.columns.mapping" = ":key, u:url, s:hits")

TBLPROPERTIES

("hbase.table.name" = ”short_urls");

P

age 1

5

Arc

hitecting the F

utu

re o

f B

ig D

ata

Page 15: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Sto

rag

e H

an

dle

r

• H

ive

de

fin

es H

ive

Sto

rag

eH

an

dle

r cla

ss fo

r d

iffe

ren

t sto

rag

e

ba

cke

nd

s: H

Ba

se

/ C

assa

nd

ra / M

on

go

DB

/ e

tc

• S

tora

ge

Ha

nd

ler

ha

s h

oo

ks fo

r

–  G

ettin

g in

pu

t / o

utp

ut fo

rma

ts

–  M

eta

da

ta o

pe

ratio

ns h

oo

k: C

RE

AT

E T

AB

LE

, D

RO

P T

AB

LE

, e

tc

• S

tora

ge

Ha

nd

ler

is a

ta

ble

le

ve

l co

nce

pt

–  D

oe

s n

ot su

pp

ort

Hiv

e p

art

itio

ns, a

nd

bu

cke

ts

Page 1

6

Arc

hitecting the F

utu

re o

f B

ig D

ata

Page 16: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Ap

ach

e H

ive

+ H

Ba

se

Arc

hite

ctu

re

Page 1

7

Arc

hitecting the F

utu

re o

f B

ig D

ata

Me

tasto

re

RD

BM

S

Hiv

e T

hrift

Se

rve

r

Dri

ve

r

CL

I H

ive

We

b

Inte

rfa

ce

HD

FS

Ma

pR

ed

uce

Exe

cu

tio

n

Pa

rse

r P

lan

ne

r

Rd

HD

FS

duce

d

Op

tim

ize

r

M S

C l i e n t

HB

ase

Sto

rag

eH

an

dle

r

B

dl

St

H

Page 17: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Hiv

e +

HB

ase

In

teg

ratio

n

• F

or

Inp

ut/O

utp

utF

orm

at, g

etS

plit

s()

, e

tc u

nd

erlyin

g H

Ba

se

cla

sse

s a

re u

se

d

• C

olu

mn

se

lectio

n a

nd

ce

rta

in filt

ers

ca

n b

e p

ush

ed

do

wn

• H

Ba

se

ta

ble

s c

an

be

use

d w

ith

oth

er(

Ha

do

op

na

tive

) ta

ble

s

an

d S

QL c

on

str

ucts

• H

ive

DD

L o

pe

ratio

ns a

re c

on

ve

rte

d to

HB

ase

DD

L

op

era

tio

ns v

ia th

e c

lien

t h

oo

k.

– A

ll o

pe

ratio

ns a

re p

erf

orm

ed

by th

e c

lien

t

– N

o tw

o p

ha

se

co

mm

it

Page 1

8

Arc

hitecting the F

utu

re o

f B

ig D

ata

Page 18: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Sch

em

a / T

yp

e M

ap

pin

g

Arc

hitecting the F

utu

re o

f B

ig D

ata

P

age 1

9

Page 19: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Sch

em

a M

ap

pin

g

• H

ive

ta

ble

+ c

olu

mn

s +

co

lum

n typ

es <

=>

HB

ase

ta

ble

+ c

olu

mn

fam

ilie

s (

+ c

olu

mn

qu

alif

iers

)

• E

ve

ry fie

ld in

Hiv

e ta

ble

is m

ap

pe

d in

ord

er

to e

ith

er

– T

he

ta

ble

ke

y (

usin

g :ke

y a

s s

ele

cto

r)

– A

co

lum

n fa

mily

(cf:)

-> M

AP

fie

lds in

Hiv

e

– A

co

lum

n (

cf:cq

)

•  H

ive

ta

ble

do

es n

ot n

ee

d to

in

clu

de

all

co

lum

ns in

HB

ase

• 

Page 2

0

Arc

hitecting the F

utu

re o

f B

ig D

ata

CREATE TABLE short_urls(

short_url string,

url string,

hit_count int,

props, map<string,string>

)

WITH SERDEPROPERTIES

("hbase.columns.mapping" = ":key, u:url, s:hits, p:")

int,

<st

PERT

ns

mapping"

="

,

ring

RTIES

mapping"

="

g,string>

ing"

="":key

u:url

s:hits

p:

rl

s:hit

":key

u:ur

>

":ke

Page 20: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Typ

e M

ap

pin

g

• R

ece

ntly a

dd

ed

to

Hiv

e (

0.9

.0)

• P

revio

usly

all

typ

es w

ere

be

ing

co

nve

rte

d to

str

ing

s in

HB

ase

• H

ive

ha

s:

– P

rim

itiv

e typ

es: IN

T, S

TR

ING

, B

INA

RY, D

AT

E,

etc

– A

RR

AY

<Typ

e>

– M

AP

<P

rim

itiv

eTyp

e, Typ

e>

– S

TR

UC

T<

a:I

NT,

b:S

TR

ING

, c:S

TR

ING

>

• H

Ba

se

do

es n

ot

ha

ve

typ

es

– B

yte

s.to

Byte

s()

Page 2

1

Arc

hitecting the F

utu

re o

f B

ig D

ata

Page 21: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Typ

e M

ap

pin

g

• Ta

ble

le

ve

l p

rop

ert

y

"hbase.table.default.storage.type” = “binary”

• Typ

e m

ap

pin

g c

an

be

giv

en

pe

r co

lum

n a

fte

r #

– A

ny p

refix o

f “binary”

,

eg

u:url#b

– A

ny p

refix o

f “string”

, e

g u:url#s

– T

he

da

sh

ch

ar “-”

, e

g u:url#-

Page 2

2

CREATE TABLE short_urls(

short_url string,

url string,

hit_count int,

props, map<string,string>

)

WITH SERDEPROPERTIES

("hbase.columns.mapping" = ":key#b,u:url#b,s:hits#b,p:#s")

nt,

str

ERT

smapping"

=":

,

ring,

TIES

mapping"

=":

string>

ing"

=":

s:hits#b

p:#s"

rl#b

s:hit

#b

u:ur

:key#b

Arc

hitecting the F

utu

re o

f B

ig D

ata

Page 22: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Typ

e M

ap

pin

g

• If th

e typ

e is n

ot a

prim

itiv

e o

r M

ap

, it is c

on

ve

rte

d to

a J

SO

N

str

ing

an

d s

eria

lize

d

• S

till

a fe

w r

ou

gh

ed

ge

s fo

r sch

em

a a

nd

typ

e m

ap

pin

g:

– N

o H

ive

BIN

AR

Y s

up

po

rt in

HB

ase

ma

pp

ing

– N

o m

ap

pin

g o

f H

Ba

se

tim

esta

mp

(ca

n o

nly

pro

vid

e p

ut

tim

esta

mp

)

– N

o a

rbitra

ry m

ap

pin

g o

f S

tru

cts

/ A

rra

ys in

to H

Ba

se

sch

em

a P

age 2

3

Arc

hitecting the F

utu

re o

f B

ig D

ata

Page 23: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Bu

lk L

oa

d

• S

tep

s to

bu

lk lo

ad

:

– S

am

ple

so

urc

e d

ata

fo

r ra

ng

e p

art

itio

nin

g

– S

ave

sa

mp

ling

re

su

lts to

a file

– R

un

CL

US

TE

R B

Y q

ue

ry u

sin

g H

ive

HF

ileO

utp

utF

orm

at and

To

talO

rde

rPa

rtitio

ne

r

– Im

po

rt H

file

s in

to H

Ba

se

ta

ble

• Id

ea

l se

tup

sh

ou

ld b

e

SE

T h

ive

.hb

ase

.bu

lk=

tru

e

INS

ER

T O

VE

RW

RIT

E T

AB

LE

we

b_

tab

le S

EL

EC

T …

.

Page 2

4

Arc

hitecting the F

utu

re o

f B

ig D

ata

Page 24: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Filt

er

Pu

sh

do

wn

Arc

hitecting the F

utu

re o

f B

ig D

ata

P

age 2

5

Page 25: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Filt

er

Pu

sh

do

wn

• Id

ea

is to

pa

ss d

ow

n filt

er

exp

ressio

ns to

th

e s

tora

ge

la

ye

r to

min

imiz

e s

ca

nn

ed

da

ta

• To

acce

ss in

de

xe

s a

t H

DF

S o

r H

Ba

se

• E

xa

mp

le:

CREATE EXTERNAL TABLE users (userid LONG, email STRING, … )

STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler’

WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,…")

SELECT ... FROM users WHERE userid > 1000000 and email LIKE

‘%@gmail.com’;

-> scan.setStartRow(Bytes.toBytes(1000000))

Page 2

6

Arc

hitecting the F

utu

re o

f B

ig D

ata

Page 26: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Filt

er

De

co

mp

ositio

n

• O

ptim

ize

r p

ush

es d

ow

n th

e p

red

ica

tes to

th

e q

ue

ry p

lan

• S

tora

ge

ha

nd

lers

ca

n n

eg

otia

te w

ith

th

e H

ive

op

tim

ize

r to

de

co

mp

ose

th

e filt

er

x > 3 AND upper(y) = 'XYZ’

• H

an

dle

x > 3

, se

nd upper(y) = ’XYZ’

as r

esid

ua

l fo

r H

ive

• W

ork

s w

ith

:

key = 3, key > 3

, e

tc

key > 3 AND key < 100

• O

nly

wo

rks a

ga

inst co

nsta

nt e

xp

ressio

ns

Page 2

7

Arc

hitecting the F

utu

re o

f B

ig D

ata

Page 27: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Se

cu

rity

Asp

ects

To

wa

rds fu

lly s

ecu

re d

ep

loym

en

ts

Arc

hitecting the F

utu

re o

f B

ig D

ata

P

age 2

8

Page 28: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Se

cu

rity

– B

ig P

ictu

re

• S

ecu

rity

be

co

me

s m

ore

im

po

rta

nt to

su

pp

ort

en

terp

rise

le

ve

l a

nd

mu

lti te

na

nt a

pp

lica

tio

ns

• 5

Diffe

ren

t C

om

po

ne

nts

to

en

su

re / im

po

se

se

cu

rity

– H

DF

S

– M

ap

Re

du

ce

– H

Ba

se

– Z

oo

ke

ep

er

– H

ive

• E

ach

co

mp

on

en

t h

as:

– A

uth

en

tica

tio

n

– A

uth

oriza

tio

n

Page 2

9

Arc

hitecting the F

utu

re o

f B

ig D

ata

Page 29: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

HB

ase

Se

cu

rity

– C

lose

r lo

ok

• R

ele

ase

d w

ith

HB

ase

0.9

2

• F

ully

op

tio

na

l m

od

ule

, d

isa

ble

d b

y d

efa

ult

• N

ee

ds a

n u

nd

erlyin

g s

ecu

re H

ad

oo

p r

ele

ase

• S

ecu

reR

PC

En

gin

e: o

ptio

na

l e

ng

ine

en

forc

ing

SA

SL

au

the

ntica

tio

n

– K

erb

ero

s

– D

IGE

ST-M

D5

ba

se

d to

ke

ns

– TokenProvider

co

pro

ce

sso

r

• A

cce

ss c

on

tro

l is

im

ple

me

nte

d a

s a

Co

pro

ce

sso

r:

AccessController

• S

tore

s a

nd

dis

trib

ute

s A

CL d

ata

via

Zo

oke

ep

er

– S

en

sitiv

e d

ata

is o

nly

acce

ssib

le b

y H

Ba

se

da

em

on

s

– C

lien

t d

oe

s n

ot n

ee

d to

au

the

ntica

te to

zk

Page 3

0

Arc

hitecting the F

utu

re o

f B

ig D

ata

Page 30: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Hiv

e S

ecu

rity

– C

lose

r lo

ok

• H

ive

ha

s d

iffe

ren

t d

ep

loym

en

t o

ptio

ns, se

cu

rity

co

nsid

era

tio

ns

sh

ou

ld ta

ke

in

to a

cco

un

t d

iffe

ren

t d

ep

loym

en

ts

• A

uth

en

tica

tio

n is o

nly

su

pp

ort

ed

at M

eta

sto

re, not on

Hiv

eS

erv

er, w

eb

in

terf

ace

, JD

BC

• A

uth

oriza

tio

n is e

nfo

rce

d a

t th

e q

ue

ry la

ye

r (D

rive

r)

• P

lug

ga

ble

au

tho

riza

tio

n p

rovid

ers

. D

efa

ult o

ne

sto

res g

lob

al/

tab

le/p

art

itio

n/c

olu

mn

pe

rmis

sio

ns in

Me

tasto

re

GRANT ALTER ON TABLE web_table TO USER bob;

CREATE ROLE db_reader

GRANT SELECT, SHOW_DATABASE ON DATABASE mydb TO

ROLE db_reader

Page 3

1

Arc

hitecting the F

utu

re o

f B

ig D

ata

Page 31: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Hiv

e D

ep

loym

en

t O

ptio

n 1

Page 3

2

Arc

hitecting the F

utu

re o

f B

ig D

ata

Clie

nt

M

eta

sto

re

R

DB

MS

Drive

r

CL

I

HD

FS

Ma

pR

ed

uce

Exe

cu

tio

n

Pa

rse

r P

lan

ne

r

du

ce

er

Op

tim

ize

r

Au

tho

riza

tio

n

Au

the

ntica

tio

n

RD

BM

S

A1

2n

/A11

N

A1

2n

/A11

N

MA

/A

HB

ase

A/A

M S

C

l i e n t

A/A

Page 32: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Hiv

e D

ep

loym

en

t O

ptio

n 2

Page 3

3

Arc

hitecting the F

utu

re o

f B

ig D

ata

Clie

nt

Me

tasto

re

R

DB

MS

Drive

r

CL

I

HD

FS

Ma

pR

ed

uce

Exe

cu

tio

n

Pa

rse

r P

lan

ne

r

Op

tim

ize

r

Au

the

ntica

tio

n

Au

tho

riza

tio

n

A1

2n

/A11

N

A1

2n

/A11

N

M S

C l i e n t

HB

ase

A/A

A

/A

Page 33: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Hiv

e D

ep

loym

en

t O

ptio

n 3

Page 3

4

Arc

hitecting the F

utu

re o

f B

ig D

ata

Clie

nt

Me

tasto

re

RD

BM

S

Hiv

e T

hrift

Se

rve

r

Drive

r CL

I

JD

BC

/OD

BC

Hiv

e W

eb

Inte

rfa

ce

HD

FS

Ma

pR

ed

uce

Exe

cu

tio

n

Pa

rse

r P

lan

ne

r

Op

tim

ize

r

Mt

t

Au

the

ntica

tio

n

Au

tho

riza

tio

n

A1

2n

/A11

N

A1

2n

/A11

N

M S

C

l i e

n t

HB

ase

A/A

A

/A

Page 34: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Hiv

e +

HB

ase

+ H

ad

oo

p S

ecu

rity

• R

eg

ard

less o

f H

ive

’s o

wn

se

cu

rity

, fo

r H

ive

to

wo

rk o

n

se

cu

re H

ad

oo

p a

nd

HB

ase

, w

e s

ho

uld

:

– O

bta

in d

ele

ga

tio

n to

ke

ns fo

r H

ad

oo

p a

nd H

Ba

se jo

bs

– E

nsu

re to

ob

ey th

e s

tora

ge

le

ve

l (H

DF

S, H

Ba

se

) p

erm

issio

n c

he

cks

– In

Hiv

eS

erv

er

de

plo

ym

en

ts, a

uth

en

tica

te a

nd

im

pe

rso

na

te th

e u

se

r

• D

ele

ga

tio

n to

ke

ns fo

r H

ad

oo

p a

re a

lre

ad

y w

ork

ing

• O

bta

inin

g H

Ba

se

de

leg

atio

n to

ke

ns a

re r

ele

ase

d in

Hiv

e

0.9

.0

Page 3

5

Arc

hitecting the F

utu

re o

f B

ig D

ata

Page 35: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Fu

ture

of H

ive

+ H

Ba

se

• Im

pro

ve

on

sch

em

a / typ

e m

ap

pin

g

• F

ully

se

cu

re H

ive

de

plo

ym

en

t o

ptio

ns

• H

Ba

se

bu

lk im

po

rt im

pro

ve

me

nts

• S

ort

ab

le s

ign

ed

nu

me

ric typ

es in

HB

ase

• F

ilte

r p

ush

do

wn

: n

on

ke

y c

olu

mn

filt

ers

• H

ive

ra

nd

om

acce

ss s

up

po

rt fo

r H

Ba

se

– h

ttp

s://c

wik

i.a

pa

ch

e.o

rg/H

CA

TA

LO

G/r

an

do

m-a

cce

ss-

fra

me

wo

rk.h

tml

Page 3

6

Arc

hitecting the F

utu

re o

f B

ig D

ata

Page 36: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Re

fere

nce

s

• S

ec

uri

ty

– h

ttp

s://issu

es.a

pa

ch

e.o

rg/jira/b

row

se

/HIV

E-2

76

4

– h

ttp

s://issu

es.a

pa

ch

e.o

rg/jira/b

row

se

/HB

AS

E-5

37

1

– h

ttp

s://issu

es.a

pa

ch

e.o

rg/jira/b

row

se

/HC

ATA

LO

G-2

45

– h

ttp

s://issu

es.a

pa

ch

e.o

rg/jira/b

row

se

/HC

ATA

LO

G-2

60

– h

ttp

s://issu

es.a

pa

ch

e.o

rg/jira/b

row

se

/HC

ATA

LO

G-2

44

– h

ttp

s://c

wik

i.a

pa

ch

e.o

rg/c

on

flu

en

ce

/dis

pla

y/H

CA

TA

LO

G/H

ca

t+S

ecu

rity

+D

esig

n

• Ty

pe

ma

pp

ing

/ F

ilte

r P

us

hd

ow

n

– h

ttp

s://issu

es.a

pa

ch

e.o

rg/jira/b

row

se

/HIV

E-1

63

4

– h

ttp

s://issu

es.a

pa

ch

e.o

rg/jira/b

row

se

/HIV

E-1

22

6

– h

ttp

s://issu

es.a

pa

ch

e.o

rg/jira/b

row

se

/HIV

E-1

64

3

– h

ttp

s://issu

es.a

pa

ch

e.o

rg/jira/b

row

se

/HIV

E-2

81

5

– h

ttp

s://issu

es.a

pa

ch

e.o

rg/jira/b

row

se

/HIV

E-1

64

3

P

age 3

7

Arc

hitecting the F

utu

re o

f B

ig D

ata

Page 37: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

Oth

er

Re

so

urc

es

Page 3

8

© H

ort

onw

ork

s Inc. 2012

• H

ad

oo

p S

um

mit

– J

un

e 1

3-1

4

– S

an

Jo

se

, C

alif

orn

ia

– w

ww

.Ha

do

op

su

mm

it.o

rg

• H

ad

oo

p T

rain

ing

an

d C

ert

ific

ati

on

– D

eve

lop

ing

So

lutio

ns U

sin

g A

pa

ch

e H

ad

oo

p

– A

dm

inis

terin

g A

pa

ch

e H

ad

oo

p

– O

nlin

e c

lasse

s a

va

ilab

le U

S, In

dia

, E

ME

A

– h

ttp

://h

ort

on

wo

rks.c

om

/tra

inin

g/

Page 38: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Th

an

ks

Qu

estio

ns?

Arc

hitecting the F

utu

re o

f B

ig D

ata

P

age 3

9