MyLife with HBase or HBase three flavors

45
MyLife with HBase OR HBase three flavors

description

Description: A HBase is a NoSQL column store. What does that mean functionally to a software developer? -A conceptional view of HBase -How to use HBase -What features HBase has -Benefits of HBase How are we using HBase here at MyLife? I will describe three projects here at MyLife that are currently using HBase in production that I was/am involved with. -Email content storage -Connection-Identity mappings -User stream cache backing Each of these projects uses HBase in a different way.

Transcript of MyLife with HBase or HBase three flavors

Page 1: MyLife with HBase or HBase three flavors

MyLife with HBase OR

HBase three flavors

Page 2: MyLife with HBase or HBase three flavors

HBase: In brief

I could talk about…

Operational HBase

Page 3: MyLife with HBase or HBase three flavors

HBase: In brief I could talk about…

ZooKeeper quorums

Source: aazk.org

Page 4: MyLife with HBase or HBase three flavors

HBase: In brief I could talk about…

Compaction

Source: www.wasteprousa.com

Page 5: MyLife with HBase or HBase three flavors

HBase: In brief

I could talk about…

How HBase is ImplementedHDFSBlocks

RegionsMETA table

Etc…

Page 6: MyLife with HBase or HBase three flavors

HBase: In brief

I could talk about…

HBase VSCassandra

RedisMySQL

Etc…

Page 7: MyLife with HBase or HBase three flavors

HBase: In brief

However none of those are my primary view as a developer.

As a developer I want to talk about what HBase can do for me. How it can make MyLife (pun intended)

easier.

Page 8: MyLife with HBase or HBase three flavors

HBase: In brief

“I choose a lazy person to do a hard job. Because a lazy person will find

an easy way to do it.”

Page 9: MyLife with HBase or HBase three flavors

HBase: In brief

“I choose a lazy person to do a hard job. Because a lazy person will find

an easy way to do it.” –Bill Gates

Page 10: MyLife with HBase or HBase three flavors

HBase: In brief

So what does HBase do for me the developer?

TL;DRIT STORES DATA!

Page 11: MyLife with HBase or HBase three flavors

HBase: In brief

How does HBase store data?

Page 12: MyLife with HBase or HBase three flavors

HBase: In brief

As a Map

Page 13: MyLife with HBase or HBase three flavors

HBase: In brief

As a MapOf Maps

Page 14: MyLife with HBase or HBase three flavors

HBase: In brief

As a MapOf MapsOf Maps

Page 15: MyLife with HBase or HBase three flavors

HBase: In brief

As a MapOf MapsOf MapsOf Maps

Page 16: MyLife with HBase or HBase three flavors

A Data Structures Interlude

Key == Last Name, First Name, Middle Initial

Value == ExtensionI.e.

Example,Dude,X x555

Page 17: MyLife with HBase or HBase three flavors

A Data Structures Interlude

So now that we know what a map is what would a map of maps looks

like? An HBase like analogy.

Page 18: MyLife with HBase or HBase three flavors

A Data Structures Interlude

An analogy ( a dated analogy if someone can think of a current one please please let me

know) to HBase is an index file in a library by ISBN. You look up the a book by ISBN. The ISBN is your key. The value in this case is a

book that contains a list of books!

Key == ISBNValue == Book that lists other books!

0786704810 Author, Title, Publisher, Year

Page 19: MyLife with HBase or HBase three flavors

HBase: In brief SortedMap[RowKey,

SortedMap[ColumnFamilyName, SortedMap[Qualifier,

SortedMap[Timestamp,Value]]]]

Page 20: MyLife with HBase or HBase three flavors
Page 21: MyLife with HBase or HBase three flavors

HBase: In brief

Some quick facts:Column families are defined ahead of time and require the table to disabled to be altered.Only Column families are fixed. Everything under that level of maps in flexible.

Qualifiers can be added or removed on the fly. Along with their versions

“The Map” itself is also defined ahead of time

Page 22: MyLife with HBase or HBase three flavors

HBase: In brief

What does this look like?DEMO TIME!

Page 23: MyLife with HBase or HBase three flavors
Page 24: MyLife with HBase or HBase three flavors

HBase: Implementations

The Test CaseThe Ideal Case

The Awesome Case

Page 25: MyLife with HBase or HBase three flavors

HBase: The Test Case

One of the services we provide to our users is a message stream. This stream can include

email. Which works like an email client (i.e. outlook or mail.app or on your phone) storing

your email messages so you can get them quickly.

We found ourselves storing 100’s of gigabytes of email contents in our Oracle RAC database.

Page 26: MyLife with HBase or HBase three flavors

HBase: The Test Case

Since this data is only accessed by key it made sense to move out of Oracle and into HBase.

Page 27: MyLife with HBase or HBase three flavors

HBase: The Test Case

Key ==accountId_providerAccountId_messageId_bodyId

Page 28: MyLife with HBase or HBase three flavors

HBase: The Test Case

Key ==accountId_providerAccountId_messageId_bodyId

This is is a nice key because all the messages for a particular user are together by prefix.

Since HBase maintains the keys sorted we can use a Scan to grab them all quickly at one time.

Page 29: MyLife with HBase or HBase three flavors

HBase: The Test Case

That’s it!

Page 30: MyLife with HBase or HBase three flavors

HBase: The Test Case

Advantages vs Previous solution:Faster

CheaperLess DB load

Page 31: MyLife with HBase or HBase three flavors

HBase: The ideal case

Another service we offer our users is the ability to import their social and email connections so

they can have one unified view of all their connections across providers. Allowing users to

manage data by person rather than by account.

Page 32: MyLife with HBase or HBase three flavors

HBase: The ideal case

This has two main pieces of data:1.The social profile information2.The relationship between that profile and an Identity

Page 33: MyLife with HBase or HBase three flavors

HBase: The ideal case

What makes this ideal for HBase? 1. The profile is sparse data that is only

accessed by key!

Page 34: MyLife with HBase or HBase three flavors

HBase: The ideal case

What makes this ideal for HBase? 2. The relationship between a profile and its

identity is only a key-value pair and it reverse!

Page 35: MyLife with HBase or HBase three flavors

A Data Structures Interlude

Key == Last Name, First Name, Middle Initial

Value == ExtensionI.e.

Example,Dude,X x555

Page 36: MyLife with HBase or HBase three flavors

A Data Structures Interlude

Key == ExtensionValue == Last Name, First Name,

Middle InitialI.e.

x555 Example,Dude,X

Page 37: MyLife with HBase or HBase three flavors

HBase: The ideal case Dataflow

1.Get profile from provider2.Check if the profile maps to an existing Identity in HBase

1. If it doesn’t exist store a version of the profile in HBase with providerId as key and profile information as values

3.Associate profile with identity 1. create row in HBase with identityId_providerId as

key4.Update profile with the identity it is associated with

Page 38: MyLife with HBase or HBase three flavors

HBase: The ideal case

Coprocessors!What are Coprocessors?

Another feature of HBase which work like triggers.

A coprocessor is a piece of logic attached to an HBase put that is executed on the HBase

cluster.

Page 39: MyLife with HBase or HBase three flavors

HBase: The Awesome Case

User stream availability

Page 40: MyLife with HBase or HBase three flavors

HBase: The Awesome Case

Originally this system used local caching to store user stream data but has the stream grew this

became impractical.

The solution here was a distributed cache great!

Page 41: MyLife with HBase or HBase three flavors

HBase: The Awesome Case

Distributed cache allows us to scale but unless we have a huge grid some user streams will still get evicted from the cache. Which means when the user visits again we have to fetch their streams

from the source which is slow…

Page 42: MyLife with HBase or HBase three flavors

HBase: The Awesome Case

Enter HBase from great to awesome!

To fix the latency associated with eviction we added HBase as a backing store to our distributed cache. This means that records in our cache are

periodically written to HBase and are written HBase before being evicted from the cache.

Page 43: MyLife with HBase or HBase three flavors

HBase: The Awesome Case

Distributed cache + HBase == Awesome!Why?

Persistence – user streams now live in HBase for as long as we want them to.

Speed – read through from HBase are fastTransparency – as far as application is concerned

everything is just in the cache

Page 44: MyLife with HBase or HBase three flavors

HBase: The Awesome Case

Distributed cache + HBase == Awesome!Why?

Reliability – HBase been solid and all the data is stored redundantly

Page 45: MyLife with HBase or HBase three flavors

That’s all folk!

Questions?