Download - OF BLOOMBERG DATA SYSTEMS HBASE AT BLOOMBERG · HBASE AT BLOOMBERG // THE EVOLUTION OF BLOOMBERG DATA SYSTEMS MEDIUM DATA NEEDS FOR THE FINANCIAL INDUSTRY MAY // 07 // 2015 . HBASE

Transcript
Page 1: OF BLOOMBERG DATA SYSTEMS HBASE AT BLOOMBERG · HBASE AT BLOOMBERG // THE EVOLUTION OF BLOOMBERG DATA SYSTEMS MEDIUM DATA NEEDS FOR THE FINANCIAL INDUSTRY MAY // 07 // 2015 . HBASE

HB

ASE

AT B

LOO

MB

ERG

//

THE EVOLUTION OF BLOOMBERG DATA SYSTEMS

MEDIUM DATA NEEDS FOR THE FINANCIAL INDUSTRY MAY // 07 // 2015

Page 2: OF BLOOMBERG DATA SYSTEMS HBASE AT BLOOMBERG · HBASE AT BLOOMBERG // THE EVOLUTION OF BLOOMBERG DATA SYSTEMS MEDIUM DATA NEEDS FOR THE FINANCIAL INDUSTRY MAY // 07 // 2015 . HBASE

HB

ASE

AT B

LOO

MB

ERG

//

BLOOMBERG 2

Leading Data and Analytics provider to the financial industry

Page 3: OF BLOOMBERG DATA SYSTEMS HBASE AT BLOOMBERG · HBASE AT BLOOMBERG // THE EVOLUTION OF BLOOMBERG DATA SYSTEMS MEDIUM DATA NEEDS FOR THE FINANCIAL INDUSTRY MAY // 07 // 2015 . HBASE

HB

ASE

AT B

LOO

MB

ERG

//

DATA IS OUR BUSINESS 3

Page 4: OF BLOOMBERG DATA SYSTEMS HBASE AT BLOOMBERG · HBASE AT BLOOMBERG // THE EVOLUTION OF BLOOMBERG DATA SYSTEMS MEDIUM DATA NEEDS FOR THE FINANCIAL INDUSTRY MAY // 07 // 2015 . HBASE

HB

ASE

AT B

LOO

MB

ERG

//

September 28: Full Workshop at Bloomberg September 30: Showcase at Strata Hadoop Call for papers at: bloomberglabs.com/data-science

DATA FOR GOOD EXCHANGE: GOVERNMENT INNOVATION, PUBLIC HEALTH, ENVIRONMENT, EDUCATION

Page 5: OF BLOOMBERG DATA SYSTEMS HBASE AT BLOOMBERG · HBASE AT BLOOMBERG // THE EVOLUTION OF BLOOMBERG DATA SYSTEMS MEDIUM DATA NEEDS FOR THE FINANCIAL INDUSTRY MAY // 07 // 2015 . HBASE

HB

ASE

AT B

LOO

MB

ERG

//

5

• We have a “medium data” problem…

• Speed and availability are paramount

• Hundreds of thousands of users with expensive requests

We’ve built many systems to address

DATA MANAGEMENT TODAY

Page 6: OF BLOOMBERG DATA SYSTEMS HBASE AT BLOOMBERG · HBASE AT BLOOMBERG // THE EVOLUTION OF BLOOMBERG DATA SYSTEMS MEDIUM DATA NEEDS FOR THE FINANCIAL INDUSTRY MAY // 07 // 2015 . HBASE

HB

ASE

AT B

LOO

MB

ERG

//

DATA MANAGEMENT CHALLENGES 6

• Single security analytics on Big Iron

• Replication of Systems and Data

• Complexity kills

Top 500 Supercomputer list, 2013

>96% Linux. 100% of top 40.

Page 7: OF BLOOMBERG DATA SYSTEMS HBASE AT BLOOMBERG · HBASE AT BLOOMBERG // THE EVOLUTION OF BLOOMBERG DATA SYSTEMS MEDIUM DATA NEEDS FOR THE FINANCIAL INDUSTRY MAY // 07 // 2015 . HBASE

HB

ASE

AT B

LOO

MB

ERG

//

DATA MANAGEMENT TOMORROW 7

• Simplicity and performance

• Benefit from external developments

• Retain our independence

• Details matter

Page 8: OF BLOOMBERG DATA SYSTEMS HBASE AT BLOOMBERG · HBASE AT BLOOMBERG // THE EVOLUTION OF BLOOMBERG DATA SYSTEMS MEDIUM DATA NEEDS FOR THE FINANCIAL INDUSTRY MAY // 07 // 2015 . HBASE

HB

ASE

AT B

LOO

MB

ERG

//

THE PREMISE 8

• Can apply big data techniques to our medium data problem, by addressing gaps in existing open systems

• HBase is a good bet • Part of a broader whole • The Biggest community wins

Page 9: OF BLOOMBERG DATA SYSTEMS HBASE AT BLOOMBERG · HBASE AT BLOOMBERG // THE EVOLUTION OF BLOOMBERG DATA SYSTEMS MEDIUM DATA NEEDS FOR THE FINANCIAL INDUSTRY MAY // 07 // 2015 . HBASE

HB

ASE

AT B

LOO

MB

ERG

//

CHALLENGES

Our requirements from HBase are: • Read performance – fast with low variability • High availability • Operational simplicity • Efficient use of good hardware • Expressive power

Bloomberg has been investing in all these aspects of HBase

Page 10: OF BLOOMBERG DATA SYSTEMS HBASE AT BLOOMBERG · HBASE AT BLOOMBERG // THE EVOLUTION OF BLOOMBERG DATA SYSTEMS MEDIUM DATA NEEDS FOR THE FINANCIAL INDUSTRY MAY // 07 // 2015 . HBASE

HB

ASE

AT B

LOO

MB

ERG

//

WE’VE MADE THAT BET 10

Page 11: OF BLOOMBERG DATA SYSTEMS HBASE AT BLOOMBERG · HBASE AT BLOOMBERG // THE EVOLUTION OF BLOOMBERG DATA SYSTEMS MEDIUM DATA NEEDS FOR THE FINANCIAL INDUSTRY MAY // 07 // 2015 . HBASE

HB

ASE

AT B

LOO

MB

ERG

//

WE’RE NOT THE ONLY ONES 11

Google Cloud Bigtable

Page 12: OF BLOOMBERG DATA SYSTEMS HBASE AT BLOOMBERG · HBASE AT BLOOMBERG // THE EVOLUTION OF BLOOMBERG DATA SYSTEMS MEDIUM DATA NEEDS FOR THE FINANCIAL INDUSTRY MAY // 07 // 2015 . HBASE

HB

ASE

AT B

LOO

MB

ERG

//

AIMING HIGHER

We can make things better by working together

Let’s be the gold standard

Page 13: OF BLOOMBERG DATA SYSTEMS HBASE AT BLOOMBERG · HBASE AT BLOOMBERG // THE EVOLUTION OF BLOOMBERG DATA SYSTEMS MEDIUM DATA NEEDS FOR THE FINANCIAL INDUSTRY MAY // 07 // 2015 . HBASE

HB

ASE

AT B

LOO

MB

ERG

//

13

Page 14: OF BLOOMBERG DATA SYSTEMS HBASE AT BLOOMBERG · HBASE AT BLOOMBERG // THE EVOLUTION OF BLOOMBERG DATA SYSTEMS MEDIUM DATA NEEDS FOR THE FINANCIAL INDUSTRY MAY // 07 // 2015 . HBASE

>>>>>>>>>>>>>> CALL TO ACTION

Page 15: OF BLOOMBERG DATA SYSTEMS HBASE AT BLOOMBERG · HBASE AT BLOOMBERG // THE EVOLUTION OF BLOOMBERG DATA SYSTEMS MEDIUM DATA NEEDS FOR THE FINANCIAL INDUSTRY MAY // 07 // 2015 . HBASE

HB

ASE

AT B

LOO

MB

ERG

//

FURTHER BOLSTER RELIABILITY 15

Great strides such as HBASE-10070 but more to do

• Improved reconciliation of state between Master, META and ZK

• More determinism in Admin/Master operations

Page 16: OF BLOOMBERG DATA SYSTEMS HBASE AT BLOOMBERG · HBASE AT BLOOMBERG // THE EVOLUTION OF BLOOMBERG DATA SYSTEMS MEDIUM DATA NEEDS FOR THE FINANCIAL INDUSTRY MAY // 07 // 2015 . HBASE

HB

ASE

AT B

LOO

MB

ERG

//

BENEFIT FROM MODERN HARDWARE 16

• 32 cores - 256GB RAM – SSD - untapped potential

• CPU load max 20% , inadequate throughput

• Multi-RS administratively painful

• Much better story with memory

Page 17: OF BLOOMBERG DATA SYSTEMS HBASE AT BLOOMBERG · HBASE AT BLOOMBERG // THE EVOLUTION OF BLOOMBERG DATA SYSTEMS MEDIUM DATA NEEDS FOR THE FINANCIAL INDUSTRY MAY // 07 // 2015 . HBASE

HB

ASE

AT B

LOO

MB

ERG

//

IMPROVE MULTI-TENANCY 17

• Mixed workloads challenging • interactive vs batch • read vs write • different read access

patterns

• Many solutions in progress

• Administrative simplicity is key

Page 18: OF BLOOMBERG DATA SYSTEMS HBASE AT BLOOMBERG · HBASE AT BLOOMBERG // THE EVOLUTION OF BLOOMBERG DATA SYSTEMS MEDIUM DATA NEEDS FOR THE FINANCIAL INDUSTRY MAY // 07 // 2015 . HBASE

HB

ASE

AT B

LOO

MB

ERG

//

SPARK INTEGRATION 18

• Analytical frameworks need a distributed database

• Columnar file format != column database

• Integrate with HBase to move towards the universal database

Page 19: OF BLOOMBERG DATA SYSTEMS HBASE AT BLOOMBERG · HBASE AT BLOOMBERG // THE EVOLUTION OF BLOOMBERG DATA SYSTEMS MEDIUM DATA NEEDS FOR THE FINANCIAL INDUSTRY MAY // 07 // 2015 . HBASE

HB

ASE

AT B

LOO

MB

ERG

//

ANALYTICS: EFFICIENCY 19

• Choice of row and columnar storage engines

• Expose primitives for efficiency: • Column pruning • Predicate pushdowns • Data locality

Page 20: OF BLOOMBERG DATA SYSTEMS HBASE AT BLOOMBERG · HBASE AT BLOOMBERG // THE EVOLUTION OF BLOOMBERG DATA SYSTEMS MEDIUM DATA NEEDS FOR THE FINANCIAL INDUSTRY MAY // 07 // 2015 . HBASE

HB

ASE

AT B

LOO

MB

ERG

//

THE FUTURE IS BRIGHT 20

• The state of the “Hadoop Database” union is strong – Increasing adoption – Strong foundation – Great community

• Prominent role in the data & analytics platform of the future

• Let’s go create the future

Page 21: OF BLOOMBERG DATA SYSTEMS HBASE AT BLOOMBERG · HBASE AT BLOOMBERG // THE EVOLUTION OF BLOOMBERG DATA SYSTEMS MEDIUM DATA NEEDS FOR THE FINANCIAL INDUSTRY MAY // 07 // 2015 . HBASE

>>>>>>>>>>>>>> THANK YOU