Rails on HBase

54
Zachary Pinter and Tony Hillerson RailsConf 2011 Rails on HBase

description

Zachary Pinter and I gave this talk at RailsConf 2011

Transcript of Rails on HBase

Page 1: Rails on HBase

Zachary Pinter and Tony HillersonRailsConf 2011

Rails on HBase

Page 2: Rails on HBase

What we will cover

Page 3: Rails on HBase

What is it?

Page 4: Rails on HBase

What are the tradeoffs that HBase

makes?

Page 5: Rails on HBase

Why HBase is probably the wrong choice for your app

Page 6: Rails on HBase

Why HBase might be the right choice for

your app

Page 7: Rails on HBase

How to use HBase with Rails

+

Page 8: Rails on HBase

What we won’t cover

Page 9: Rails on HBase

This is not a deep dive

Page 10: Rails on HBase

This is not a guide to setting up a

production HBase cluster

*See http://cloudera.com

Page 11: Rails on HBase

So...What is HBase?

Page 12: Rails on HBase

HBase is based on Bigtable

Page 13: Rails on HBase

HBase Grew out of Hadoop

Page 14: Rails on HBase

Core Concepts

Page 15: Rails on HBase

Distributed Column Oriented Database

Page 17: Rails on HBase

Column Families

key core:link stats:hits

llb0ga http://google.com 100504

llb260 http://en.oreilly.com/rails2011/public/schedule/detail/17886 2

core column family stats column family

Page 18: Rails on HBase

Column FamiliesMust be defined at creation or added to schema later

Should group data accessed and sized similarly

Page 19: Rails on HBase

ColumnsCan be dynamically added given that the family exists

Stored as a byte array - do what you need with it

Page 20: Rails on HBase

Column Families: Data Locality

Page 21: Rails on HBase

Regions

master

llb0ga...

llbee9

lld92q...

lq1rro

Regionserver Regionserver

Page 22: Rails on HBase

Main OperationsGet

Put

Incr

Delete

Scan

Page 23: Rails on HBase

JRuby based shell

Page 24: Rails on HBase

hbase shell

ruby-1.9.2-p180 :002 > status

1 servers, 0 dead, 1.0000 average load

ruby-1.9.2-p180 :003 > (1..10).each{ status }

1 servers, 0 dead, 3.0000 average load

1 servers, 0 dead, 3.0000 average load

1 servers, 0 dead, 3.0000 average load

1 servers, 0 dead, 3.0000 average load

1 servers, 0 dead, 3.0000 average load

1 servers, 0 dead, 3.0000 average load

1 servers, 0 dead, 3.0000 average load

1 servers, 0 dead, 3.0000 average load

1 servers, 0 dead, 3.0000 average load

1 servers, 0 dead, 3.0000 average load

=> 1..10

ruby-1.9.2-p180 :004 >

Page 25: Rails on HBase

APIREST

Thrift

Apache Avro

Java API

Page 26: Rails on HBase

NoSQL = Tradeoffs

What are the tradeoffs that HBase makes?

Page 27: Rails on HBase

Highly DistributedPros Cons

• Built-in sharding

• No single server holds all the data

• Different servers operate on different

slices of the data

• Makes joins difficult/impossible

• Limited real-time queries/sorting

Page 28: Rails on HBase

Sorted Row KeysBytearray keys

Any kind of data you want

Sorted in byte order

All row access through the key

Page 29: Rails on HBase

Sorted Row KeysPros Cons

• Real-time random lookups by key

• Real-time range queries• Conflates unique identifier with default

sort

• Keys are an integral part of schema

design, requiring extra consideration

Page 30: Rails on HBase

Map / ReducePros Cons

• Scales linearly: more servers = faster

response times

• Operates on huge datasets (billions of

rows)

• Works well with dynamic/non-uniform

schemas

• Scans every row per query *

• Not real-time (batch operation)

Page 31: Rails on HBase

HBase vs. Xhttp://nosql-database.org

http://www.sevenweeks.org/Seven Databases in Seven WeeksEric Redmond

Don’t Miss His Talk!!

Page 32: Rails on HBase

Why HBase is probably the wrong choice for your app

Page 33: Rails on HBase

HBase Limitations

Page 34: Rails on HBase

HBase LimitationsLimited real-time queries (key lookup, range lookup)

Limited sorting (one default sort per table)

Limited transactions

Manual indexing

Joins/Normalization

Storage of large binary data *

Page 35: Rails on HBase

•At least 5 Servers (Zookeeper, Region Server, Name Nodes, Data Nodes, etc)

•Memory Intensive (2-4gb bar minimum per server, depending on server role)

•CPU Intensive

•Rough EC2 Cost:

5 c1.xlarge reserved for 1 year= over $850/month(not counting bandwidth!)

HBase Requirements

Page 36: Rails on HBase

Why HBase might be the right choice for

your app

Page 37: Rails on HBase

Large data setsAbility to aggregate and analyze billions of rows

Starts to shine when relational databases break downIs your db already sharded?Are you using SQL queries that take hours to run?

Page 38: Rails on HBase

Hadoop IntegrationHDFS for storing files

Native Map/ReduceSupports data localityDoesn’t require data to be transferred

Page 39: Rails on HBase

Flexible schemaSupports non-heterogeneous data through column families with mixed key/value pairs

Ability to refactor the data store via map/reduce (beats a multi-day ALTER TABLE command)

Page 40: Rails on HBase

Rails on HBase

Page 41: Rails on HBase

hbase-stargatehttps://github.com/greglu/hbase-stargate

Uses Stargate, HBase REST server

Page 42: Rails on HBase

Rhinohttps://github.com/sqs/rhino/

Thrift API

Seems to be abandoned?

Page 43: Rails on HBase

Massive Recordhttps://github.com/CompanyBook/massive_record

It works!

Uses Thrift API

Page 44: Rails on HBase

Kwisatz: An URL Shortener

Page 45: Rails on HBase

Demo:JRuby Scripts

Page 46: Rails on HBase

Demo:Rails Front-End

Page 47: Rails on HBase

SourceDemo source code

github.com/effectiveui/jruby-hbase-demogithub.com/effectiveui/rails-hbase-massive-record

Page 48: Rails on HBase

Getting Set Up(Mac) brew install hbase

Cloudera VMhttp://bit.ly/dSfPa9

Page 49: Rails on HBase

Recap

Page 50: Rails on HBase

Where Would I Use HBase?

Page 51: Rails on HBase

Where is HBase’s Sweet Spot

Page 52: Rails on HBase

Where Would I Use Rails With HBase?

Page 53: Rails on HBase

Questions?

Page 54: Rails on HBase

Thank You!Zachary: @zpinter

Tony: @thillerson

EffectiveUI.com