BY VAIBHAV NACHANKAR ARVIND DWARAKANATH Evaluation of Hbase Read/Write (A study of Hbase and it’s...

24
BY VAIBHAV NACHANKAR ARVIND DWARAKANATH Evaluation of Hbase Read/Write (A study of Hbase and it’s benchmarks)

Transcript of BY VAIBHAV NACHANKAR ARVIND DWARAKANATH Evaluation of Hbase Read/Write (A study of Hbase and it’s...

Page 1: BY VAIBHAV NACHANKAR ARVIND DWARAKANATH Evaluation of Hbase Read/Write (A study of Hbase and it’s benchmarks)

BY VAIBHAV NACHANKAR

ARVIND DWARAKANATH

Evaluation of Hbase Read/Write(A study of Hbase and it’s benchmarks)

Page 2: BY VAIBHAV NACHANKAR ARVIND DWARAKANATH Evaluation of Hbase Read/Write (A study of Hbase and it’s benchmarks)

Recap of Hbase

Hbase is an open-source, distributed, column-oriented and sorted-map data storage.

It is a Hadoop Database; sits on HDFS.

Hbase can support reliable storage and efficient access of a huge amount of structured data

Page 3: BY VAIBHAV NACHANKAR ARVIND DWARAKANATH Evaluation of Hbase Read/Write (A study of Hbase and it’s benchmarks)

Hbase Architecture

Page 4: BY VAIBHAV NACHANKAR ARVIND DWARAKANATH Evaluation of Hbase Read/Write (A study of Hbase and it’s benchmarks)

Recap of Hbase (contd.)

Modeled after BigTable.Map/reduce with Hadoop. Optimizations for real time queries.No single point of failure.Random access performance is like MySQL.Application : Facebook Messaging Database.

Page 5: BY VAIBHAV NACHANKAR ARVIND DWARAKANATH Evaluation of Hbase Read/Write (A study of Hbase and it’s benchmarks)

Hbase Benchmark Techniques

‘Hadoop Hbase-0.20.2 Performance Evaluation’ by D. Carstoiu, A. Cernian, A. Olteanu. University of Bucharest.

STRATEGY: Uses random read, writes to test and benchmark Hadoop with Hbase.

Page 6: BY VAIBHAV NACHANKAR ARVIND DWARAKANATH Evaluation of Hbase Read/Write (A study of Hbase and it’s benchmarks)

Hbase Benchmark Techniques (contd.)

‘Hadoop Hbase-0.20.2 Performance Evaluation’ by Kareem Dana at Duke University. It shows a varied set of test cases for executions to test HBase.

STRATEGY: Tested on column families, columns, Sort and interspersed read/writes.

Page 7: BY VAIBHAV NACHANKAR ARVIND DWARAKANATH Evaluation of Hbase Read/Write (A study of Hbase and it’s benchmarks)

Yahoo! Cloud Serving Benchmark (YCSB)

‘Benchmarking Cloud Serving Systems with YCSB’ by Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, Russell Sears.

This paper/project is designed to benchmark existing and newer cloud storage technologies.

The benchmark is done so far on Hbase, Cassandra, MongoDb, Project Voldemort and SQL.

Page 8: BY VAIBHAV NACHANKAR ARVIND DWARAKANATH Evaluation of Hbase Read/Write (A study of Hbase and it’s benchmarks)

YCSB

The benchmark tool uses Workload files and the workload files can be customized according to users.

You can specify 50/50 read/write, 95/5 r/w and so on.

The code for the project is available on Github.

https://github.com/brianfrankcooper/YCSB.git

Page 9: BY VAIBHAV NACHANKAR ARVIND DWARAKANATH Evaluation of Hbase Read/Write (A study of Hbase and it’s benchmarks)

Example of a Workload

# Yahoo! Cloud System Benchmark# Workload A: Update heavy workload# Application example: Session store recording recent actions# # Read/update ratio: 50/50# Default data size: 1 KB records (10 fields, 100 bytes each, plus key)# Request distribution: zipfianrecordcount=1000operationcount=1000workload=com.yahoo.ycsb.workloads.CoreWorkloadreadallfields=true

readproportion=0.5updateproportion=0.5scanproportion=0insertproportion=0

Page 10: BY VAIBHAV NACHANKAR ARVIND DWARAKANATH Evaluation of Hbase Read/Write (A study of Hbase and it’s benchmarks)

Example of a Workload

# Yahoo! Cloud System Benchmark# Workload B: Read mostly workload# Application example: photo tagging; add a tag is an update, but most operations

are to read tags# # Read/update ratio: 95/5# Default data size: 1 KB records (10 fields, 100 bytes each, plus key)# Request distribution: zipfianrecordcount=1000operationcount=1000workload=com.yahoo.ycsb.workloads.CoreWorkloadreadallfields=true

readproportion=0.95updateproportion=0.05scanproportion=0insertproportion=0

Page 11: BY VAIBHAV NACHANKAR ARVIND DWARAKANATH Evaluation of Hbase Read/Write (A study of Hbase and it’s benchmarks)

Our Project

Install Hbase and get Hadoop to interface with it. Study benchmark techniques.

Build a suite of codes and get it to run on Hadoop/Hbase.

Include basic get, put, scan operations.

Extend Word Count’s map-reduce to add to Hbase.

Compare with Brisk Cassandra.

Page 12: BY VAIBHAV NACHANKAR ARVIND DWARAKANATH Evaluation of Hbase Read/Write (A study of Hbase and it’s benchmarks)

About Brisk

Cassandra is a No-SQL BigTable-based database.

Datastax enterprise built Brisk to interface Hadoop with Cassandra

Hadoop + Cassandra = Brisk!!

Page 13: BY VAIBHAV NACHANKAR ARVIND DWARAKANATH Evaluation of Hbase Read/Write (A study of Hbase and it’s benchmarks)

Brisk Architecture

Page 14: BY VAIBHAV NACHANKAR ARVIND DWARAKANATH Evaluation of Hbase Read/Write (A study of Hbase and it’s benchmarks)

Challenges Faced

Configuration of Hbase is a tedious job! Not for the weak of will!

Hbase subsequent releases do not keep the APIs consistent. So we ran into a lot of ‘deprecated API’ error messages.

Hadoop compatibility with Hbase has to be verified before we proceed with installations.

Page 15: BY VAIBHAV NACHANKAR ARVIND DWARAKANATH Evaluation of Hbase Read/Write (A study of Hbase and it’s benchmarks)

Challenges Faced (contd.)

Very few documents on installation details of Hbase.

Even fewer for Brisk!

Page 16: BY VAIBHAV NACHANKAR ARVIND DWARAKANATH Evaluation of Hbase Read/Write (A study of Hbase and it’s benchmarks)

Performance for Word Count (2 nodes/2 cores each)

1 2 3 4 541

42

43

44

45

46

47

48

49

1 mapper/ 3 reducer

1 mapper/ 3 reducer

Number of readings

Time in secs

Average = 45.484

Page 17: BY VAIBHAV NACHANKAR ARVIND DWARAKANATH Evaluation of Hbase Read/Write (A study of Hbase and it’s benchmarks)

Performance for Word Count (contd.)

1 2 3 4 547.5

48

48.5

49

49.5

50

50.5

51

51.5

52

52.5

2 mapper/ 3 reducers

2 mapper/ 3 reducers

Time in secs

Number of readings

Average = 49.664

Page 18: BY VAIBHAV NACHANKAR ARVIND DWARAKANATH Evaluation of Hbase Read/Write (A study of Hbase and it’s benchmarks)

Performance for Word Count (contd.)

1 2 3 4 50

10

20

30

40

50

60

2 mapper/ 2 reducers

2 mapper/ 2 reducers

Time in secs

Number of readings

Average = 43.7008

Page 19: BY VAIBHAV NACHANKAR ARVIND DWARAKANATH Evaluation of Hbase Read/Write (A study of Hbase and it’s benchmarks)

Performance for a simple get/put/scan (2 nodes/ 2 core)

1 2 3 4 50

0.5

1

1.5

2

2.5

getscanput

Tim

e in

sec

s

Number of readings

Average for get, scan and put are 1.841.6266 and 1.71.

Page 20: BY VAIBHAV NACHANKAR ARVIND DWARAKANATH Evaluation of Hbase Read/Write (A study of Hbase and it’s benchmarks)

Performance for Word Count (3 nodes/2 cores each)

1 2 3 4 529

30

31

32

33

34

35

36

37

1 mapper/ 3 reducers

1 mapper/ 3 reducers

Time in secs

Average = 34.047

Number of Readings

Page 21: BY VAIBHAV NACHANKAR ARVIND DWARAKANATH Evaluation of Hbase Read/Write (A study of Hbase and it’s benchmarks)

Performance for Word Count (contd.)

1 2 3 4 533

34

35

36

37

38

39

2 mappers/ 3 reducers

2 mappers/ 3 reducers

Number of Readings

Average = 36.1012

Time in secs

Page 22: BY VAIBHAV NACHANKAR ARVIND DWARAKANATH Evaluation of Hbase Read/Write (A study of Hbase and it’s benchmarks)

Performance for Word Count (contd.)

1 2 3 4 50

5

10

15

20

25

30

35

40

45

50

2 mappers/ 2 reducers

2 mappers/ 2 reducers

Time in secs

Number of readings

Average = 37.4358

Page 23: BY VAIBHAV NACHANKAR ARVIND DWARAKANATH Evaluation of Hbase Read/Write (A study of Hbase and it’s benchmarks)

Conclusions

Brisk seems a lot more promising tool; as it integrates Cassandra and Hadoop together without much ado.

Hbase/Hadoop APIs have to be made consistent. With standardization, it would be easier to work with them.

Hbase Reads are faster than the Writes.

Page 24: BY VAIBHAV NACHANKAR ARVIND DWARAKANATH Evaluation of Hbase Read/Write (A study of Hbase and it’s benchmarks)

Thank YouQuestions??