June 2014 HUG : Hive On Tez - Benchmarked at Yahoo Scale

33
Benchmarking Hive at Yahoo Scale PRESENTED BY Mithun Radhakrishnan ⎪ June 18, 2014 Hadoop User Group

description

June 2014 HUG : Hive On Tez - Benchmarked at Yahoo Scale

Transcript of June 2014 HUG : Hive On Tez - Benchmarked at Yahoo Scale

Page 1: June 2014 HUG : Hive On Tez - Benchmarked at Yahoo Scale

Benchmark ing H ive a t Yahoo Sca le

P R E S E N T E D B Y M i t h u n R a d h a k r i s h n a n J u n e 1 8 , 2 0 1 4⎪

H a d o o p U s e r G r o u p

Page 2: June 2014 HUG : Hive On Tez - Benchmarked at Yahoo Scale

2

About myself

HCatalog Committer, Hive contributor› Metastore, Notifications, HCatalog APIs› Integration with Oozie, Data Ingestion

Other odds and ends› DistCp

[email protected]

Hadoop User Group, 201406181830, Yahoo Sunnyvale

Page 3: June 2014 HUG : Hive On Tez - Benchmarked at Yahoo Scale

3

About this talk

Introduction to “Yahoo Scale” The use-case in Yahoo The Benchmark The Setup The Observations (and, possibly, lessons) Fisticuffs

Hadoop User Group, 201406181830, Yahoo Sunnyvale

Page 4: June 2014 HUG : Hive On Tez - Benchmarked at Yahoo Scale

4

The Y!Grid

16 Hadoop Clusters in YGrid› 32500 Nodes› 750K jobs a day

Hadoop 0.23.10.x, 2.4.x Large Datasets

› Daily, hourly, minute-level frequencies› Terabytes of data, 1000s of files, per dataset instance

Pig 0.11 Hive 0.10 / HCatalog 0.5

› => Hive 0.12

Hadoop User Group, 201406181830, Yahoo Sunnyvale

Page 5: June 2014 HUG : Hive On Tez - Benchmarked at Yahoo Scale

5

Data Processing Use cases

Hadoop User Group, 201406181830, Yahoo Sunnyvale

Pig for Data Pipelines› Imperative paradigm› ~45% Hadoop Jobs on Production Clusters

• M/R + Oozie = 41%

Hive for Ad hoc queries› SQL› Relatively smaller number of jobs

• *Major* Uptick

Use HCatalog for Inter-op

Page 6: June 2014 HUG : Hive On Tez - Benchmarked at Yahoo Scale

6 Yahoo Confidential & Proprietary

Hive is Currently the Fastest Growing Product on the Grid

Mar-13 Apr-13 May-13 Jun-13 Jul-13 Aug-13 Sep-13 Oct-13 Nov-13 Dec-13 Jan-14 Feb-14 Mar-14 Apr-14 May-140

5,000,000

10,000,000

15,000,000

20,000,000

25,000,000

30,000,000

0.0%

1.0%

2.0%

3.0%

4.0%

5.0%

6.0%

7.0%

8.0%

9.0%

10.0%

All Jobs Hive (% of all jobs)

All

Gri

d J

ob

s (i

n M

illi

on

s)

Hiv

eJo

bs

(% o

f A

ll J

ob

s)

2.4 million Hive jobs

Page 7: June 2014 HUG : Hive On Tez - Benchmarked at Yahoo Scale

7

Business Intelligence Tools

{Tableau, MicroStrategy, Excel, … } Challenges:

› Security• ACLs, Authentication, Encryption over the wire, Full-disk Encryption

› Bandwidth• Transporting results over ODBC

› Query Latency• Query execution time

• Cost of query “optimizations”

• “Bad” queries

Hadoop User Group, 201406181830, Yahoo Sunnyvale

Page 8: June 2014 HUG : Hive On Tez - Benchmarked at Yahoo Scale

8

The Benchmark

TPC-h› Industry standard (tpc.org/tpch)› 22 queries› dbgen –s 1000 –S 3

• Parallelizable

Reynold Xin’s excellent work:› https://github.com/rxin› Transliterated queries to suit Hive 0.9

Hadoop User Group, 201406181830, Yahoo Sunnyvale

Page 9: June 2014 HUG : Hive On Tez - Benchmarked at Yahoo Scale

9

Relational Diagram

Hadoop User Group, 201406181830, Yahoo Sunnyvale

PARTKEY

NAME

MFGR

BRAND

TYPE

SIZE

CONTAINER

COMMENT

RETAILPRICE

PARTKEY

SUPPKEY

AVAILQTY

SUPPLYCOST

COMMENT

SUPPKEY

NAME

ADDRESS

NATIONKEY

PHONE

ACCTBAL

COMMENT

ORDERKEY

PARTKEY

SUPPKEY

LINENUMBER

RETURNFLAG

LINESTATUS

SHIPDATE

COMMITDATE

RECEIPTDATE

SHIPINSTRUCT

SHIPMODE

COMMENT

CUSTKEY

ORDERSTATUS

TOTALPRICE

ORDERDATE

ORDER-PRIORITY

SHIP-PRIORITY

CLERK

COMMENT

CUSTKEY

NAME

ADDRESS

PHONE

ACCTBAL

MKTSEGMENT

COMMENT

PART (P_)SF*200,000

PARTSUPP (PS_)SF*800,000

LINEITEM (L_)SF*6,000,000

ORDERS (O_)SF*1,500,000

CUSTOMER (C_)

SF*150,000

SUPPLIER (S_)SF*10,000

ORDERKEY

NATIONKEY

EXTENDEDPRICE

DISCOUNT

TAX

QUANTITY

NATIONKEY

NAME

REGIONKEY

NATION (N_)25

COMMENT

REGIONKEY

NAME

COMMENT

REGION (R_)5

Page 10: June 2014 HUG : Hive On Tez - Benchmarked at Yahoo Scale

10

The Setup

› 350 Node cluster• Xeon boxen: 2 Slots with E5530s => 16 CPUs

• 24GB memory– NUMA enabled

• 6 SATA drives, 2TB, 7200 RPM Seagates

• RHEL 6.4

• JRE 1.7 (-d64)

• Hadoop 0.23.7+/2.3+, Security turned off

• Tez 0.3.x

• 128MB HDFS block-size

› Downscale tests: 100 Node cluster• hdfs-balancer.sh

Hadoop User Group, 201406181830, Yahoo Sunnyvale

Page 11: June 2014 HUG : Hive On Tez - Benchmarked at Yahoo Scale

11

The Prep

Data generation:› Text data: dbgen on MapReduce› Transcode to RCFile and ORC: Hive on MR

• insert overwrite table orc_table partition( … ) select * from text_table;

› Partitioning:• Only for 1TB, 10TB cases

• Perils of dynamic partitioning

› ORC File:• 64MB stripes, ZLIB Compression

Hadoop User Group, 201406181830, Yahoo Sunnyvale

Page 12: June 2014 HUG : Hive On Tez - Benchmarked at Yahoo Scale

Observat ions

Page 13: June 2014 HUG : Hive On Tez - Benchmarked at Yahoo Scale

13 Hadoop User Group, 201406181830, Yahoo Sunnyvale

Page 14: June 2014 HUG : Hive On Tez - Benchmarked at Yahoo Scale

14

100 GB

› 18x speedup over Hive 0.10 (Textfile)• 6-50x

› 11.8x speedup over Hive 0.10 (RCFile)• 5-30x

› Average query time: 28 seconds• Down from 530 (Hive 0.10 Text)

› 85% queries completed in under a minute

Hadoop User Group, 201406181830, Yahoo Sunnyvale

Page 15: June 2014 HUG : Hive On Tez - Benchmarked at Yahoo Scale

15 Hadoop User Group, 201406181830, Yahoo Sunnyvale

Page 16: June 2014 HUG : Hive On Tez - Benchmarked at Yahoo Scale

16

1 TB

› 6.2x speedup over Hive 0.10 (RCFile)• Between 2.5-17x

› Average query time: 172 seconds• Between 5-947 seconds

• Down from 729 seconds (Hive 0.10 RCFile)

› 61% queries completed in under 2 minutes› 81% queries completed in under 4 minutes

Hadoop User Group, 201406181830, Yahoo Sunnyvale

Page 17: June 2014 HUG : Hive On Tez - Benchmarked at Yahoo Scale

17 Hadoop User Group, 201406181830, Yahoo Sunnyvale

Page 18: June 2014 HUG : Hive On Tez - Benchmarked at Yahoo Scale

18

10 TB

› 6.2x speedup over Hive 0.10 (RCFile)• Between 1.6-10x

› Average query time: 908 seconds (426 seconds excluding outliers)• Down from 2129 seconds with Hive 0.10 RCFile

– (1712 seconds excluding outliers)

› 61% queries completed in under 5 minutes› 71% queries completed in under 10 minutes› Q6 still completes in 12 seconds!

Hadoop User Group, 201406181830, Yahoo Sunnyvale

Page 19: June 2014 HUG : Hive On Tez - Benchmarked at Yahoo Scale

19

Explaining the speed-ups

Hadoop 2.x, et al. Tez

› (Arbitrary DAG)-based Execution Engine› “Playing the gaps” between M&R

• Temporary data and the HDFS

› Feedback loop› Smart scheduling› Container re-use› Pipelined job start-up

Hive › Statistics› “Vector-ized” Execution

ORC› PPD

Hadoop User Group, 201406181830, Yahoo Sunnyvale

Page 20: June 2014 HUG : Hive On Tez - Benchmarked at Yahoo Scale

20 Hadoop User Group, 201406181830, Yahoo Sunnyvale

Page 21: June 2014 HUG : Hive On Tez - Benchmarked at Yahoo Scale

21 Hadoop User Group, 201406181830, Yahoo Sunnyvale

ORC File Layout

Data is composed of multiple streams per column

Index allows for skipping rows (default to every 10,000 rows), keeping position in each stream, and min-max for each column

Footer contains directory of stream locations, and the encoding for each column

Integer columns are serialized using run-length encoding

String columns are serialized using dictionary for column values, and the same run length encoding

Stripe footer is used to find the requested column’s data streams and adjacent stream reads are merged

Page 22: June 2014 HUG : Hive On Tez - Benchmarked at Yahoo Scale

22 Hadoop User Group, 201406181830, Yahoo Sunnyvale

ORC UsageCREATE TABLE addresses ( name string, street string, city string, state string, zip int ) STORED AS orc TBLPROPERTIES ("orc.compress"= "ZLIB");LOCATION ‘/path/to/addresses’;

ALTER TABLE ... [PARTITION partition_spec] SET FILEFORMAT orc

SET hive.default.fileformat = orcSET hive.exec.orc.memory.pool = 0.50 (ORC writer is allowed 50% of JVM heap size by default)

ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde’INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat’ OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat';

Key Default Comments

orc.compress ZLIB high-level compression (one of NONE, ZLIB, Snappy)

orc.compress.size 262,144 (256 KB) number of bytes in each compression chunk

orc.stripe.size 67,108,864 (64 MB) number of bytes in each stripe. Each ORC stripe is processed in one map task (try 32 MB to cut down on disk I/O)

orc.row.index.stride 10,000 number of rows between index entries (must be >= 1,000). A larger stride-size increases the probability of not being able to skip the stride, for a predicate.

orc.create.index true whether to create row indexes. This is for predicate push-down. If data is frequently accessed/filtered on a certain column, then sorting on the column and using index-filters makes column filters work faster

Page 23: June 2014 HUG : Hive On Tez - Benchmarked at Yahoo Scale

23 Hadoop User Group, 201406181830, Yahoo Sunnyvale

Page 24: June 2014 HUG : Hive On Tez - Benchmarked at Yahoo Scale

24 Hadoop User Group, 201406181830, Yahoo Sunnyvale

Page 25: June 2014 HUG : Hive On Tez - Benchmarked at Yahoo Scale

25

Configuring ORC

set hive.merge.mapredfiles=true set hive.merge.mapfiles=true set orc.stripe.size=67,108,864

› Half the HDFS block-size

• Tangent: nStripes vs nBlocks

• Tangent: DistCp

set orc.compress=???› Depends on size and distribution› Snappy compression hasn’t been explored

YMMV› Experiment

Hadoop User Group, 201406181830, Yahoo Sunnyvale

Page 26: June 2014 HUG : Hive On Tez - Benchmarked at Yahoo Scale

26 Hadoop User Group, 201406181830, Yahoo Sunnyvale

Page 27: June 2014 HUG : Hive On Tez - Benchmarked at Yahoo Scale

Conclusions

Page 28: June 2014 HUG : Hive On Tez - Benchmarked at Yahoo Scale

28

Y!Grid sticking with Hive

Familiarity› Existing ecosystem

Community Scale Multitenant Coming down the pike

› CBO› In-memory caching solutions atop HDFS

• RAMfs a la Tachyon?

Hadoop User Group, 201406181830, Yahoo Sunnyvale

Page 29: June 2014 HUG : Hive On Tez - Benchmarked at Yahoo Scale

29

We’re not done yet

SQL compliance Scaling up the metastore

performance Better BI Tool integration Faster transport

› HiveServer2 result-sets

Hadoop User Group, 201406181830, Yahoo Sunnyvale

Page 30: June 2014 HUG : Hive On Tez - Benchmarked at Yahoo Scale

30

References

The YDN blog post:› http

://yahoodevelopers.tumblr.com/post/85930551108/yahoo-betting-on-apache-hive-tez-and-yarn

Code:› https://github.com/mythrocks/hivebench (TPC-h scripts, datagen, transcode utils)› https://github.com/t3rmin4t0r/tpch-gen (Parallel TPC-h gen)› https://github.com/rxin/TPC-H-Hive (TPC-h scripts for Hive)› https://issues.apache.org/jira/browse/HIVE-600 (Yuntao’s initial TPC-h JIRA)

Hadoop User Group, 201406181830, Yahoo Sunnyvale

Page 31: June 2014 HUG : Hive On Tez - Benchmarked at Yahoo Scale

Thank You

@[email protected]

We are hiring!

Reach out to us at [email protected].

Page 32: June 2014 HUG : Hive On Tez - Benchmarked at Yahoo Scale

I ’m glad you asked.

Page 33: June 2014 HUG : Hive On Tez - Benchmarked at Yahoo Scale

33

Sharky comments

Testing with Shark 0.7.x and Shark 0.8› Compatible with Hive Metastore 0.9› 100GB datasets : Admirable performance› 1TB/10TB: Tests did not run completely

• Failures, especially in 10TB cases

• Hangs while shuffling data

• Scaled back to 100 nodes -> More tests ran through, but not completely

› nReducers: Not inferred

Miscellany› Security› Multi-tenancy› Compatibility

Hadoop User Group, 201406181830, Yahoo Sunnyvale