Download - Cassandra Summit 2014: Apache Cassandra Best Practices at Ebay

Transcript
Page 1: Cassandra Summit 2014: Apache Cassandra Best Practices at Ebay

Feng  Qu  principal  database  engineer,  ebay  inc  

September  11,  2014  

Cassandra  Best  Prac-ces  at    

ebay  inc  

CassandraSummit2014 | #CassandraSummit

Page 2: Cassandra Summit 2014: Apache Cassandra Best Practices at Ebay

CassandraSummit2014 | #CassandraSummit

Agenda  

•  ebay  inc  Cassandra  footprints  •  NoSQL  life  cycle  •  Cassandra  best  prac?ces    •  Q&A  

Page 3: Cassandra Summit 2014: Apache Cassandra Best Practices at Ebay

CassandraSummit2014 | #CassandraSummit

ebay  inc  

Page 4: Cassandra Summit 2014: Apache Cassandra Best Practices at Ebay

CassandraSummit2014 | #CassandraSummit

ebay  inc  Database  Pla5orms  •  We  manage  thousands  of  databases  powering  eBay  and  PayPal  

Page 5: Cassandra Summit 2014: Apache Cassandra Best Practices at Ebay

CassandraSummit2014 | #CassandraSummit

Why  NoSQL?  

•  Challenges  of  tradi?onal  RDBMS  •  Performance  penalty  to  maintain  ACID  features  •  Lack  of  na?ve  sharding  and  replica?on  features  •  Lack  of  linear  scalability  •  Cost  of  soMware/hardware  •  Higher  cost  of  commit  

•  NoSQL  used  in  eBay  inc  •  Cassandra,  Couchbase,  MongoDB  managed  by  DBA  •  HBase,  Redis,  OpenTSDB    managed  by  developers  

Page 6: Cassandra Summit 2014: Apache Cassandra Best Practices at Ebay

CassandraSummit2014 | #CassandraSummit

Cassandra  @  ebay  inc  

•  Started  in  2011  at  eBay  and  later  expanded  to  PayPal  •  Started  with  Apache  Cassandra  0.8,  now  using  Apache  Cassandra  2.0  and  DataStax  Enterprise  4.0  

•  Over  a  dozen  produc?on  clusters  on  hundreds  of  servers  across  3  data  centers  

•  Choices  between  dedicated  cluster  for  large/cri?cal  use  case  and  mul?-­‐tenant  cluster  for  small  use  cases    

•  Over  20  billions  daily  reads/writes  to  Cassandra    •  Cluster  size  varies  from  4-­‐node  to  80-­‐node  •  100TB+  user  data  on  HDD,  local  SSD  and  SSD  array  

•  One  cluster  is  es?mated  to  grow  over  few  PBs  

Page 7: Cassandra Summit 2014: Apache Cassandra Best Practices at Ebay

CassandraSummit2014 | #CassandraSummit

Use Case Analysis

Data Modeling

Capacity Planning Deployment

Operation

NoSQL  Life  Cycle  

Page 8: Cassandra Summit 2014: Apache Cassandra Best Practices at Ebay

CassandraSummit2014 | #CassandraSummit

Data  Modeling  Phase  

•  Development  team  requests  a  review  mee?ng  for  a  new  use  case  with  data  architect    

•  Once  data  architect  understands  requirement  and  then  recommends  a  proper  data  store.  It  could  be  either  one  of  RDBMS  or  one  of  NoSQL  products  we  support  

•  Both  par?es  work  on  data  modeling  together  •  Outputs  the  engagement  are  a  set  of  ?ckets,  for  tracking  purpose,  which  captures  project  informa?on  and  data  configura?on  for  chosen  data  store.    

Page 9: Cassandra Summit 2014: Apache Cassandra Best Practices at Ebay

CassandraSummit2014 | #CassandraSummit

Data  Modeling  Best  Prac-ces  

•  Unlike  tradi?onal  RDBMS,  data  modeling  for  Cassandra  is  quite  different.    •  Modeling  around  query  pa_ern,  not  en?ty  •  De-­‐normalize  to  improve  read  performance    •  Separate  read  heavy  data  from  write  heavy  data  •  Store  values  in  column  names  as  names  are  physical  sorted  already  

•  Former  eBay  architect  Jay  Patel  published  few  technical  blogs  on  Cassandra  data  modeling.    

Page 10: Cassandra Summit 2014: Apache Cassandra Best Practices at Ebay

CassandraSummit2014 | #CassandraSummit

Data  Modeling  Best  Prac-ces  -­‐  indexing  

•  Secondary  index    +  Less  overhead  as  built  in    +  data  and  index  are  changed  atomically      -­‐  not  scale  well  with  high  cardinality  data  

•  Column  family  as  index    +  No  hot  spot    -­‐  index  is  maintained  manually  by  applica?on    -­‐  index  change  is  not  atomically    

•  Avoid  secondary  index  and  use  column  family  as  index  if  possible  

     

Page 11: Cassandra Summit 2014: Apache Cassandra Best Practices at Ebay

CassandraSummit2014 | #CassandraSummit

Benchmark  Tes-ng  

•  Benchmark  tes?ng  is  key  to  capacity  planning  •  Performance  baseline  with  near-­‐real  traffic  in  produc?on  size  environment  •  for  different  type  of  hardware  •  for  different  soMware  release  •  for  different  use  case  or  workload  

•  A  proac?ve  and  repe??ve  process  

Page 12: Cassandra Summit 2014: Apache Cassandra Best Practices at Ebay

CassandraSummit2014 | #CassandraSummit

Capacity  Planning  Phase  •  Is  key  to  avoid  surprise  in  produc?on  •  The  concept  behind  capacity  planning  is  simple,  but  the  mechanics  are  harder.  

•  Business  requirements  may  increase,  need  to  forecast  how  much  resource  must  be  added  to  the  system  to  ensure  that  user  experience  con?nues  uninterrupted  •  Input:  clearly  defined  capacity  goal  coming  from  business  requirement  and  performance  baseline  from  benchmark  test  

•  Output:  Iden?fy  resources  to  be  added,  such  as  memory,  CPU,  storage,  I/O,  network  

•  Always  prepare  for  peak  +  headroom  

Page 13: Cassandra Summit 2014: Apache Cassandra Best Practices at Ebay

CassandraSummit2014 | #CassandraSummit

Deployment  Best  Prac-ces  

•  SoMware  packages  with  customized  op?miza?on  •  kernel,  JVM  heap,  compac?on  

•  Deployment  automa?on  for  efficiency  •  Mul?  data  center  deployment  for  load  balancing  and  disaster  recovery  

•  Vnode  is  a  must  for  manageability  •  SSD  as  default  storage  requires  addi?onal  OS  level  tuning    

Page 14: Cassandra Summit 2014: Apache Cassandra Best Practices at Ebay

CassandraSummit2014 | #CassandraSummit

Opera-on  Best  Prac-ces  

•  Collect  system  and  database  metrics  •  Monitoring  and  aler?ng  

•  event  driven  and  metrics  driven  alerts  •  Opera?on  runbook  

•  Reduce  human  error  •  Performance  tuning  runbook  

•  nodetool  tpstats  for  dropped  requests  •  nodetool  cdistograms  for  latency  distribu?on  

•  Troubleshoo?ng  runbook  •  Document  previous  incidents  as  future  reference  

 

Page 15: Cassandra Summit 2014: Apache Cassandra Best Practices at Ebay

CassandraSummit2014 | #CassandraSummit

Opera-on  Best  Prac-ces  

•  Rou?ne  repair  is  not  really  needed  if  there  is  no  deletes.  You  s?ll  need  run  repair  aMer  bringing  up  a  down  node  if  it  is  dead  for  a  while  

•  Use  CNAME  in  client  configura?on  to  avoid  client  conf  change  in  case  of  hardware  replacement    with  new  IP/name  

•  Reduce  gc_grace  to  reduce  overall  data  size  •  Disable  row  cache,  unless  you  have  <100K  rows  •  Collect  sta?s?cs,  real-­‐?me  or  historical,  to  monitor  overall  system  performance  

•  Disable  swap  to  avoid  a  slow  node  

Page 16: Cassandra Summit 2014: Apache Cassandra Best Practices at Ebay

CassandraSummit2014 | #CassandraSummit

Capacity  Review  

•  Rou?ne  capacity  review  and  adjustment  •  When  to  scale  up  and  when  to  scale  out  

•  In  general,  scale  out  by  adding  nodes  to  increase  capacity  with  NoSQL  

•  Some?mes,  it’s  cost  efficient  to  scale  up  at  component  level  by  iden?fying  scaling  bo_leneck,  then  resolve  it  accordingly  •  Network  bandwidth:  upgrade  to  10  Gbps  network  •  I/O  latency:  upgrade  to  (be_er)  SSD  •  Storage:  add/expand  data  volume  

Page 17: Cassandra Summit 2014: Apache Cassandra Best Practices at Ebay

CassandraSummit2014 | #CassandraSummit

Typical  Use  Cases    •  Write  Intensive:  metrics  collec?on,  logging  

•  Collec?ng  metrics  from  tens  of  thousands  devices  periodically    

•  Read  Intensive:  home  page  feeds  •  Recommenda?on  backend  to  generate  dynamic  taste  graph    

•  Mixed  workload:  personaliza?on,  classifica?on  •  Data  is  loaded  from  data  warehouse  periodically  in  bulk  and  from  user  events  consistently  

•  Data  is  retrieved  in  real  ?me  when  user  visits  ebay  site  

 

Page 18: Cassandra Summit 2014: Apache Cassandra Best Practices at Ebay

CassandraSummit2014 | #CassandraSummit

Metrics  Collec-on  Applica-on  

Page 19: Cassandra Summit 2014: Apache Cassandra Best Practices at Ebay

CassandraSummit2014 | #CassandraSummit

The  End    •  We  are  hiring  for  NoSQL  talent.    •  Contact:  

•  [email protected]  •  www.linkedin.com/in/fengqu/  

•  Q&A