StateoftheDatabase% · StateoftheDatabase% @HBase’ hp:// hbase.apache.org’ 20150928...

24
State of the Database @HBase h)p://hbase.apache.org 20150928 Nick Dimiduk (@xefyr) h)p://n10k.com #apachebigdata

Transcript of StateoftheDatabase% · StateoftheDatabase% @HBase’ hp:// hbase.apache.org’ 20150928...

Page 1: StateoftheDatabase% · StateoftheDatabase% @HBase’ hp:// hbase.apache.org’ 20150928 Nick’Dimiduk’(@xefyr)’ hp://n10k.com’ #apachebigdata’

State  of  the  Database  

@HBase  h)p://hbase.apache.org  

2015-­‐09-­‐28  

Nick  Dimiduk  (@xefyr)  h)p://n10k.com  

#apachebigdata  

Page 2: StateoftheDatabase% · StateoftheDatabase% @HBase’ hp:// hbase.apache.org’ 20150928 Nick’Dimiduk’(@xefyr)’ hp://n10k.com’ #apachebigdata’

Agenda  o State  of  the  Project  o State  of  the  SoMware  o State  of  the  Ecosystem  o Latest  Releases  o Q  &  A  

Page 3: StateoftheDatabase% · StateoftheDatabase% @HBase’ hp:// hbase.apache.org’ 20150928 Nick’Dimiduk’(@xefyr)’ hp://n10k.com’ #apachebigdata’

Project:  Vision  Simple,  steady,  and  powerful:  “A  first  class  high  performance  horizontally  scalable  data  storage  engine  for  Big  Data,  suitable  as  the  store  of  record  for  mission  criZcal  data.”  

J.G.  Keenan  Elementary  Theory  of  Gas  Turbines  and  Jet  Propulsion  (1946)  

Page 4: StateoftheDatabase% · StateoftheDatabase% @HBase’ hp:// hbase.apache.org’ 20150928 Nick’Dimiduk’(@xefyr)’ hp://n10k.com’ #apachebigdata’

State  of  the  Project  o  Data  access  for  medium-­‐  and  high-­‐scale  services  o  Hundreds  of  enterprises  and  startups  o  Some  of  the  largest  Internet  companies  in  the  world  o  Running  major  producZon  workloads  since  2011  o  Use-­‐cases:  messaging,  security,  measurement/“IoT”,  collaboraZon,  digital  media,  digital  adverZsing,  telecommunicaZons,  computaZonal  biology,  clinical  informaZcs/healthcare,  insurance  

Page 5: StateoftheDatabase% · StateoftheDatabase% @HBase’ hp:// hbase.apache.org’ 20150928 Nick’Dimiduk’(@xefyr)’ hp://n10k.com’ #apachebigdata’
Page 6: StateoftheDatabase% · StateoftheDatabase% @HBase’ hp:// hbase.apache.org’ 20150928 Nick’Dimiduk’(@xefyr)’ hp://n10k.com’ #apachebigdata’

Project:  Goals  o Availability:  Always  more,  always  faster  o  Stability  and  operability  o  Scaling  up,  scaling  down  o Up-­‐to-­‐date  with  NextGen  “commodity”  hardware  o MulZ-­‐tenancy  o Diversity  of  ecosystem  

Page 7: StateoftheDatabase% · StateoftheDatabase% @HBase’ hp:// hbase.apache.org’ 20150928 Nick’Dimiduk’(@xefyr)’ hp://n10k.com’ #apachebigdata’

State  of  the  SoMware  o  Mature  codebase  

o  >100  contributors  o  1.1M  lines  of  code  (each  acZve  branch)  o  est.  1200+  human-­‐years’  effort  

o  Clusters  sizes  from  10  to  1000+  machines  (that  we  know  of)  o  Runs  on  HDFS,  MapR,  Gluster,  GPFS,  Lustre  o  HBase  as  a  Service:  AWS/EMR,  HDInsight,  Qubole,  Google  (sort-­‐of)  

Page 8: StateoftheDatabase% · StateoftheDatabase% @HBase’ hp:// hbase.apache.org’ 20150928 Nick’Dimiduk’(@xefyr)’ hp://n10k.com’ #apachebigdata’

SoMware:  Releases  

Page 9: StateoftheDatabase% · StateoftheDatabase% @HBase’ hp:// hbase.apache.org’ 20150928 Nick’Dimiduk’(@xefyr)’ hp://n10k.com’ #apachebigdata’

SoMware:  AcZve  Development  o  Smaller  regions,  more  regions  

o  Less  write  amplificaZon  o 1M+  region  clusters  

o  Stability  o ProcedureV2  o Assignment  improvements/stability  

o Backup,  restore  tools  o Built  on  snapshots,  easier  operaZons  

Page 10: StateoftheDatabase% · StateoftheDatabase% @HBase’ hp:// hbase.apache.org’ 20150928 Nick’Dimiduk’(@xefyr)’ hp://n10k.com’ #apachebigdata’

SoMware:  AcZve  Development  o  AdapZon:  Workloads  

o  HBase  as  Medium  Object  Store  (MOB)  o  Tunable  Availability  

o  Region  replicas  o  TIMELINE  consistency  

o  Coprocessor  API  stability  o  Profile-­‐driven  opZmizaZon  o  Less  GC,  more  RAM  (off-­‐heap)  

Page 11: StateoftheDatabase% · StateoftheDatabase% @HBase’ hp:// hbase.apache.org’ 20150928 Nick’Dimiduk’(@xefyr)’ hp://n10k.com’ #apachebigdata’

SoMware:  AcZve  Development  o MulZ-­‐tenancy  

o Table  groups  o Quotas  o PrioriZes  

o  Improved  machine  uZlizaZon  o More  RAM  (100’s  of  GB)  o  IOPS  o All  of  the  CPUs  

Page 12: StateoftheDatabase% · StateoftheDatabase% @HBase’ hp:// hbase.apache.org’ 20150928 Nick’Dimiduk’(@xefyr)’ hp://n10k.com’ #apachebigdata’

Ecosystem  o  OpenTSDB  o  TransacZon  Managers  

o  Themis,  Tephera,  Omid2,  LeanXcale  o  Graph  engines  

o  Titan,  Giraph,  Zen,  S2Graph  (+loads  of  custom  soluZons)  o  Myriad  SQL’s  o  Other  Hadoop  components  o  Google  Cloud  Bigtable  

Page 13: StateoftheDatabase% · StateoftheDatabase% @HBase’ hp:// hbase.apache.org’ 20150928 Nick’Dimiduk’(@xefyr)’ hp://n10k.com’ #apachebigdata’

Ecosystem:  SQL  

Page 14: StateoftheDatabase% · StateoftheDatabase% @HBase’ hp:// hbase.apache.org’ 20150928 Nick’Dimiduk’(@xefyr)’ hp://n10k.com’ #apachebigdata’

Ecosystem:  Hadoop  Components  o YARN-­‐2928  ApplicaZon  Timeline  Service  o HIVE-­‐9452  HBase  to  store  Hive  metadata  o AMBARI-­‐5707  Ambari  Metrics  System  

Page 15: StateoftheDatabase% · StateoftheDatabase% @HBase’ hp:// hbase.apache.org’ 20150928 Nick’Dimiduk’(@xefyr)’ hp://n10k.com’ #apachebigdata’

Release:  0.94  o Last  (final?)  release:  0.94.27,  2015-­‐03-­‐26  o “ancient  history”  

o No  new  deployments  o ExisZng  users  highly  encouraged  to  upgrade  

o Requires  downZme  to  upgrade  😫  😡  (╯°□°)╯(  ┻━┻  

Page 16: StateoftheDatabase% · StateoftheDatabase% @HBase’ hp:// hbase.apache.org’ 20150928 Nick’Dimiduk’(@xefyr)’ hp://n10k.com’ #apachebigdata’

Release:  0.98  o Last  release:  0.98.14,  2015-­‐08-­‐31  o “legacy”  

o Most  producZon  deploys  (probably)  o Largest  producZon  clusters  (probably)  o New  features  backported  when  possible  

Page 17: StateoftheDatabase% · StateoftheDatabase% @HBase’ hp:// hbase.apache.org’ 20150928 Nick’Dimiduk’(@xefyr)’ hp://n10k.com’ #apachebigdata’

Release  1.x  o Last  release:  1.1.2,  2015-­‐09-­‐01  o “stable”  

o ProducZon  deploys  moving  here  o AcZve  development  

o Rolling  upgrade  from  0.98.x  😄  😍  ヽ(´ー`)ノ  

Page 18: StateoftheDatabase% · StateoftheDatabase% @HBase’ hp:// hbase.apache.org’ 20150928 Nick’Dimiduk’(@xefyr)’ hp://n10k.com’ #apachebigdata’

Release  1.0  o  Released  1.0.0,  2015-­‐02-­‐24  o  AdopZng  semanZc  versioning  

o MAJOR.MINOR.PATCH[-­‐idenZfier]  o  Patch  releases  don’t  quite  follow  spec  yet  

o  Client  /  Server  API  cleanup  o  Interfaces,  builder  pa)ern,  @InterfaceAudience  

o  Region  Replicas  o  Trade  Consistency,  resources  for  Availability  

Page 19: StateoftheDatabase% · StateoftheDatabase% @HBase’ hp:// hbase.apache.org’ 20150928 Nick’Dimiduk’(@xefyr)’ hp://n10k.com’ #apachebigdata’

Region  Replicas  o MulZple  Region  Servers  host  each  region  

o Primary  +  N  read  replicas  (usually  2)  o Primary  is  authority  on  reads  and  writes  o Replicas  tail  replicate  edits,  offer  TIMELINE  view  

o  Client’s  choice  o Read  primary  only  for  “classic”  strong  consistency  o Fan-­‐out  reads  for  faster,  potenZally  TIMELINE  results  

Page 20: StateoftheDatabase% · StateoftheDatabase% @HBase’ hp:// hbase.apache.org’ 20150928 Nick’Dimiduk’(@xefyr)’ hp://n10k.com’ #apachebigdata’

Release  1.1  o  Release  1.1.0,  2015-­‐05-­‐15  o  Async  RPC  client  o  Scanner  improvements  

o  RPC  chunking,  heartbeat  messages,  API  o  ProcedureV2  

o  Improved  operaZonal  reliability  o  RPC  thro)ling  

o  quotas  for  per  user,  table,  namespace  o  CompacZon  thro)ling,  monitoring  

Page 21: StateoftheDatabase% · StateoftheDatabase% @HBase’ hp:// hbase.apache.org’ 20150928 Nick’Dimiduk’(@xefyr)’ hp://n10k.com’ #apachebigdata’

ProcedureV2  o Distributed,  fault-­‐tolerant  operaZons  

o MulZple  steps  on  mulZple  machines  o Roll-­‐back  in  case  of  failure  

o  CoordinaZon  of  long-­‐running  procedures  o CompacZons,  splits,  &c.  

o  Progress  tracking  o NoZficaZons  across  mulZple  machines  o Current  status  inquiries  

Page 22: StateoftheDatabase% · StateoftheDatabase% @HBase’ hp:// hbase.apache.org’ 20150928 Nick’Dimiduk’(@xefyr)’ hp://n10k.com’ #apachebigdata’

Branch-­‐1.2  o  Next  up  in  1.x  line  

o  “any  day  now”  o  Java  8  support  

o  formally,  thoroughly,  officially  

o  NaZve  CRC  checksums  o  perf!  

o  SyncTable  o  rsync  for  HBase  tables  

o  Region  normalizer  o  Balancer  for  region  size  

o  Flush-­‐per-­‐store  o  on  by  default  

o  ProcV2  all  the  things!  o  (More)  CompacZon  

improvements  

Page 23: StateoftheDatabase% · StateoftheDatabase% @HBase’ hp:// hbase.apache.org’ 20150928 Nick’Dimiduk’(@xefyr)’ hp://n10k.com’ #apachebigdata’

Region  Normalizer  o  AnZ-­‐entropy  for  region  size  

o  Converge  towards  uniform  size  o  Compliments  balancer  working  toward  uniform  distribuZon  

o  Managed  by  Master,  runs  in  the  background  (like  balancer)  o  Pluggable  normalizaZon  strategies  (“simple”  default)  o  Use-­‐cases  

o  Merge  away  regions  from  expired  Zmeseries  data  o  Smooth  uneven  bulk  loads  o  Correct  operator  iniZal  split  guesses  o  Ease  upgrades  from  ancient  versions  (0.92/1g  vs.  today/20g)  

Page 24: StateoftheDatabase% · StateoftheDatabase% @HBase’ hp:// hbase.apache.org’ 20150928 Nick’Dimiduk’(@xefyr)’ hp://n10k.com’ #apachebigdata’

Thanks!  

@HBase  h)p://hbase.apache.org  

2015-­‐09-­‐28  

Nick  Dimiduk  (@xefyr)  h)p://n10k.com  

#apachebigdata