How$to$choose$a$NoSQL$solu1on · 2017-11-07 · Emergence$of$NoSQL$ •...

30
How to choose a NoSQL solu1on Vikas Sangwan, Intuit

Transcript of How$to$choose$a$NoSQL$solu1on · 2017-11-07 · Emergence$of$NoSQL$ •...

Page 1: How$to$choose$a$NoSQL$solu1on · 2017-11-07 · Emergence$of$NoSQL$ • Dynamic$growth$of$datagenerated$ by$Users,$applicaons$and$sensors.$$$ • Structured,$SemiMStructured$and$

How  to  choose  a  NoSQL  solu1on

Vikas  Sangwan,  Intuit  

Page 2: How$to$choose$a$NoSQL$solu1on · 2017-11-07 · Emergence$of$NoSQL$ • Dynamic$growth$of$datagenerated$ by$Users,$applicaons$and$sensors.$$$ • Structured,$SemiMStructured$and$

We  will  be  talking  about…

•  Why  use  NoSQL  •  Some  Concepts  and  Terminology  (Consistency,  

Availability,  CAP  Theorem)  •  ClassificaBons  and  Survey  of  NoSQL  Databases  •  How  to  choose  a  NoSQL  soluBon  

 

Page 3: How$to$choose$a$NoSQL$solu1on · 2017-11-07 · Emergence$of$NoSQL$ • Dynamic$growth$of$datagenerated$ by$Users,$applicaons$and$sensors.$$$ • Structured,$SemiMStructured$and$

Characteris1cs  of  Typical  RDBMS  

•  Generally  not  designed  to  scale  horizontally.  o  Scale  well  for  reads  o  Hard  to  scale  for  writes  

•  Structured  and  normalized  data  •  Well  defined  schemas  

 •  Increased  performance  by  

denormalizing  •  ApplicaBon  Managed  

Sharding  •  DistribuBng  Memory  

Systems  

Page 4: How$to$choose$a$NoSQL$solu1on · 2017-11-07 · Emergence$of$NoSQL$ • Dynamic$growth$of$datagenerated$ by$Users,$applicaons$and$sensors.$$$ • Structured,$SemiMStructured$and$

Emergence  of  NoSQL  

•  Dynamic  growth  of  data  generated  by  Users,  applicaBons  and  sensors.      

•  Structured,  Semi-­‐Structured  and  Un-­‐structured  data  with  the  rise  of  Web  2.0,  social  networking  and  mobile.  

•  DistribuBng  compuBng  availability  with  VirtualizaBon  an  cloud  compuBng.  

Page 5: How$to$choose$a$NoSQL$solu1on · 2017-11-07 · Emergence$of$NoSQL$ • Dynamic$growth$of$datagenerated$ by$Users,$applicaons$and$sensors.$$$ • Structured,$SemiMStructured$and$

Scaling  massive  data  set  

Page 6: How$to$choose$a$NoSQL$solu1on · 2017-11-07 · Emergence$of$NoSQL$ • Dynamic$growth$of$datagenerated$ by$Users,$applicaons$and$sensors.$$$ • Structured,$SemiMStructured$and$

Scaling  massive  data  set  

Page 7: How$to$choose$a$NoSQL$solu1on · 2017-11-07 · Emergence$of$NoSQL$ • Dynamic$growth$of$datagenerated$ by$Users,$applicaons$and$sensors.$$$ • Structured,$SemiMStructured$and$

Characteris1cs  for  NoSQL  Data  Stores  

•  Designed  to  run  on  clusters  •  Mostly  Open  source  •  Support  unstructured,  semi-­‐structured  and  structured  data  •  Support  Schema-­‐less  data  modeling  •  Transparent  Auto-­‐sharding  •  DistribuBng  Query  Mechanisms  (Map  Reduce)  

Page 8: How$to$choose$a$NoSQL$solu1on · 2017-11-07 · Emergence$of$NoSQL$ • Dynamic$growth$of$datagenerated$ by$Users,$applicaons$and$sensors.$$$ • Structured,$SemiMStructured$and$

Riak  

Redis  

HBase  

Neo4J  

Cassendra  

MangoDB  

CouchDB  

InfiniteGraph  

Project  Voldemort  

Amazon  Dynamo  

Amazon  SimpleDB  

Google  BigTable  

BerkleyDB  

Hypertable  

Memcached  

OrientDB  

FlockDB  

Tokyo  Cabinet  

RavenDB  

KAI  

Which  NoSQL  DB?  

Page 9: How$to$choose$a$NoSQL$solu1on · 2017-11-07 · Emergence$of$NoSQL$ • Dynamic$growth$of$datagenerated$ by$Users,$applicaons$and$sensors.$$$ • Structured,$SemiMStructured$and$

Master-­‐Slave  Replica1on  

•  Useful  for  scaling  reads  •  Read  Resilient  •  Lack  of  consistency  

Page 10: How$to$choose$a$NoSQL$solu1on · 2017-11-07 · Emergence$of$NoSQL$ • Dynamic$growth$of$datagenerated$ by$Users,$applicaons$and$sensors.$$$ • Structured,$SemiMStructured$and$

Peer  to  Peer  Replica1on  

•  Resilient  to  node  failure  •  Can  easily  add  or  

remove  nodes  •  Inconsistency  

Page 11: How$to$choose$a$NoSQL$solu1on · 2017-11-07 · Emergence$of$NoSQL$ • Dynamic$growth$of$datagenerated$ by$Users,$applicaons$and$sensors.$$$ • Structured,$SemiMStructured$and$

Data  Sharding  

•  Improved  performance  for  write  and  read  

•  Data  DistribuBon  o  Users  or  LocaBon  based  o  Even  Load  on  all  nodes  

•  Not  resilient  to  node  failures  

Page 12: How$to$choose$a$NoSQL$solu1on · 2017-11-07 · Emergence$of$NoSQL$ • Dynamic$growth$of$datagenerated$ by$Users,$applicaons$and$sensors.$$$ • Structured,$SemiMStructured$and$

Combining  Replica1on  and  Sharding  

Sharding  and  Master-­‐Slave  ReplicaBon   Sharding  and  Peer  to  Peer    ReplicaBon  

Page 13: How$to$choose$a$NoSQL$solu1on · 2017-11-07 · Emergence$of$NoSQL$ • Dynamic$growth$of$datagenerated$ by$Users,$applicaons$and$sensors.$$$ • Structured,$SemiMStructured$and$

Consistency    •  Strong  Consistency    

o  Highest  form  of  consistency,  all  data  changes  are  atomic  and  take  effect  immediately.    

•  Eventual  Consistency    o  Eventually  all  updates  propagate  through  the  distributed  systems  and  

all  nodes  will  become  consistent.  

•  Weak  Consistency    o  No  guarantee  about  all  updates  being  propagate  to  all  nodes  and  

same  user  may  see  data  out  of  sync.  

Page 14: How$to$choose$a$NoSQL$solu1on · 2017-11-07 · Emergence$of$NoSQL$ • Dynamic$growth$of$datagenerated$ by$Users,$applicaons$and$sensors.$$$ • Structured,$SemiMStructured$and$

CAP  Theorem  

Page 15: How$to$choose$a$NoSQL$solu1on · 2017-11-07 · Emergence$of$NoSQL$ • Dynamic$growth$of$datagenerated$ by$Users,$applicaons$and$sensors.$$$ • Structured,$SemiMStructured$and$

NoSQL  Databases  

•  Key-­‐Value  Store  o  Redis,  Riak  

•  Column  Family  o  HBase,  Cassendra    

•  Document  o  MongoDB,  CouchDB  

•  Graph  o  Neo4J,  OrientDB  

Key-­‐Value  Store  

Column  Family  

Graph  Based  

Document  Oriented  

NoSQL  

Page 16: How$to$choose$a$NoSQL$solu1on · 2017-11-07 · Emergence$of$NoSQL$ • Dynamic$growth$of$datagenerated$ by$Users,$applicaons$and$sensors.$$$ • Structured,$SemiMStructured$and$

Key-­‐Value  Data  Store  

•  Examples:  Redis,  Riak,  Memcached,  Berkley  DB,  Amazon  Dynamo,  LinkedIn  Voldemort  

•  Suitable  for  storing  User  Session  details,  preferences,  profiles.      

•  Not  suitable  for  highly  correlated  data,  transacBons  over  mulBple  operaBons,  data  based  queries.  

Key   Value  

Name   Tom  

Street  Name   Silver  Crest  

Street  Number   15609  

City   San  Diego  

State   California  

Phone   858  111  1111  

Page 17: How$to$choose$a$NoSQL$solu1on · 2017-11-07 · Emergence$of$NoSQL$ • Dynamic$growth$of$datagenerated$ by$Users,$applicaons$and$sensors.$$$ • Structured,$SemiMStructured$and$

Redis  

•  Key  Value  Store  in  namespaces,  primarily  in  memory  

•  Data  Structure  Server  •  Extensive  commands  •  Very  Fast,  trade  durability  for  

speed  •  Configurable  durability  and  

replicaBon  •  Pub/Sub  •  Who  is  using  –  GitHub,  CraigsList,  

Engine  Yard,  Flicker,  Yahoo,  StackOverflow  

String  Integer  List  Set  Hash  Blobs  

(Data  Structures)    

Key  

Page 18: How$to$choose$a$NoSQL$solu1on · 2017-11-07 · Emergence$of$NoSQL$ • Dynamic$growth$of$datagenerated$ by$Users,$applicaons$and$sensors.$$$ • Structured,$SemiMStructured$and$

Riak  

•  Key  Value  stored  in  buckets…primarily  distributed  store.  

•  Speaks  web  language    •  Map  Reduce  and  Stored  funcBons  •  Adhoc  relaBon  support  through  

links  •  Fault  Tolerant  •  Support  Quorum  for  Consistency  •  Who  is  using  -­‐  Comcast,  Yammer,  

Voxer,  Boeing,  BestBuy,  Joyent,  Kiip,  DotCloud,  Formspring  

 

Value    (blob,  text,  xml  etc)  

 

Key  

Page 19: How$to$choose$a$NoSQL$solu1on · 2017-11-07 · Emergence$of$NoSQL$ • Dynamic$growth$of$datagenerated$ by$Users,$applicaons$and$sensors.$$$ • Structured,$SemiMStructured$and$

Column  Family  

•  Examples:  Hbase,  Cassendra,  Amazon  SimpleDB,  Hypertable  

•  Suitable  for  Blogging  Systems,  Logs  Management,  Visit  counters  

•  Not  suitable  for  data  in  need  to  ACID  transacBons,  queries  by  combining  data,  when  queries  on  data  may  change.    

Key   Value  

Key1   {Column1Key1:ValueA,Column2Key1:ValueB}  

Key2   {Column1Key2:ValueD,Column3Key2:ValueE}  

Key3   {Column1Key3:ValueF,Column3Key3:ValueG}  

Key4   {Column1Key4:ValueH,Column2Key4:ValueI}  

Key5   {Column1Key5:ValueJ,Column3Key5:ValueK}  

Page 20: How$to$choose$a$NoSQL$solu1on · 2017-11-07 · Emergence$of$NoSQL$ • Dynamic$growth$of$datagenerated$ by$Users,$applicaons$and$sensors.$$$ • Structured,$SemiMStructured$and$

HBase  

•  Fault  tolerant,  consistent  and  scalable  column  family  database.  

•  Built  on  Hadoop  •  In  build  support  for  versioning,  

expiraBon    and  compression.  •  REST,  Thrim,  Avro  and  other  

interfaces  •  Uses  Regions  for  scalability.  •  Used  by  Facebook,  

StumbleUpon,  Ebay,  Yahoo,  Ning,  Meetup  etc.  

Row  Key  

Column1:ValueX,    Column2:ValueY  

Column2:ValueX,    Column3:ValueY  

Column4:ValueX,    Column5:ValueY,  Column6:ValueZ    

Column  Families  

Page 21: How$to$choose$a$NoSQL$solu1on · 2017-11-07 · Emergence$of$NoSQL$ • Dynamic$growth$of$datagenerated$ by$Users,$applicaons$and$sensors.$$$ • Structured,$SemiMStructured$and$

Document  Oriented  

•  Examples:  MongoDB,  CouchDB,  RavenDB  

•  Suitable  for  Blogging  Systems,  Logs  Management,  Content  Management  System  

•  Not  suitable  for  data  in  need  to  ACID  transacBons  across  systems,  queries  by  combining  aggregate  data  

Key   Value  

Key1   JSON/XML  Document1  

Key2   JSON/XML  Document2  

Key3   JSON/XML  Document3  

Key4   JSON/XML  Document4  

Key5   JSON/XML  Document5  

Page 22: How$to$choose$a$NoSQL$solu1on · 2017-11-07 · Emergence$of$NoSQL$ • Dynamic$growth$of$datagenerated$ by$Users,$applicaons$and$sensors.$$$ • Structured,$SemiMStructured$and$

MongoDB  

•  Store  documents  as  BSON  •  JSON,  Java  script  and  Wire  

Protocol  interfaces    •  Master-­‐Slave  replicaBon,  Replica  

Sets  •  Support  auto-­‐sharding  for  

scalability  •  Support  Indexing  and  Map-­‐Reduce  •  Data  Center  Aware  •  Adhoc  queries  support  •  Used  by  Bit.ly,  FourSquare  etc.  

   

{Name:  Tom,  Address{  

Street:  Silver  Crest,  Street  Number:  13568  

City:  San  Diego{,  Phone:  858  111  1111.  Company:  Foo  Inc}  

     

Key  

Page 23: How$to$choose$a$NoSQL$solu1on · 2017-11-07 · Emergence$of$NoSQL$ • Dynamic$growth$of$datagenerated$ by$Users,$applicaons$and$sensors.$$$ • Structured,$SemiMStructured$and$

Graph  DB  

•  Neo4J,  OrientDB,  InfiniB  Graph  

•  Good  for  Social  connected  data,  recommendaBon  and  locaBon  based  services.  

•  Not  suitable  for  analyBcal  soluBons,  disjoint  data  sets  

Key   Value  

Key1   Value1  

Key   Value  

Key1   Value1  

Key   Value  

Key1   Value1  

Key   Value  

Key1   Value1  

Key   Value  

Key1   Value1  

Page 24: How$to$choose$a$NoSQL$solu1on · 2017-11-07 · Emergence$of$NoSQL$ • Dynamic$growth$of$datagenerated$ by$Users,$applicaons$and$sensors.$$$ • Structured,$SemiMStructured$and$

Neo4J  

•  Nodes,  relaBons  -­‐  properBes  •  Consistent  through  

transacBons  •  Support  ACID  transacBons  •  Rest  Interface  •  Highly  Available  through  

replicated  slaves  •  Used  by  Adobe,  Cisco,  

Fuseworks,  Open  Tree  of  life.  

Page 25: How$to$choose$a$NoSQL$solu1on · 2017-11-07 · Emergence$of$NoSQL$ • Dynamic$growth$of$datagenerated$ by$Users,$applicaons$and$sensors.$$$ • Structured,$SemiMStructured$and$

How  to  I  choose  a  NoSQL  DB  

•  Feature  based  comparison  •  Performance  Benchmark  •  Proof  of  concept  

 

Page 26: How$to$choose$a$NoSQL$solu1on · 2017-11-07 · Emergence$of$NoSQL$ • Dynamic$growth$of$datagenerated$ by$Users,$applicaons$and$sensors.$$$ • Structured,$SemiMStructured$and$

Feature  based  comparison  

•  Access  and  programming  interface  •  Queries  support  •  Consistency  •  Versioning  •  ReplicaBon  Models  •  Scaling  •  MulB  data  center  awareness  •  Tools  and  uBliBes  •  Community  Support  

 

Page 27: How$to$choose$a$NoSQL$solu1on · 2017-11-07 · Emergence$of$NoSQL$ • Dynamic$growth$of$datagenerated$ by$Users,$applicaons$and$sensors.$$$ • Structured,$SemiMStructured$and$

Performance  Benchmark  

•  Yahoo  cloud  service  benchmark  (YCSB)  o  50/50  Read  and  Update  o  95/5  Read  and  Update  o  Scalability  Test    

•  Basho  Bench  •  Individual  Data  Store’s  Benchmarks  

Page 28: How$to$choose$a$NoSQL$solu1on · 2017-11-07 · Emergence$of$NoSQL$ • Dynamic$growth$of$datagenerated$ by$Users,$applicaons$and$sensors.$$$ • Structured,$SemiMStructured$and$

Proof  of  Concept  

•  Test  the  scenarios  which  are  highly  important  to  you.    •  Choose  the  most  common  and  some  unique,  bizarre  use  

cases.  •  Baseline  the  assumpBons  and  volume  definiBons  •  Build  a  representaBve  test,  as  possible  close  to  realisBc    data  

Page 29: How$to$choose$a$NoSQL$solu1on · 2017-11-07 · Emergence$of$NoSQL$ • Dynamic$growth$of$datagenerated$ by$Users,$applicaons$and$sensors.$$$ • Structured,$SemiMStructured$and$

Polyglot  Persistence  

•  Different  database  for  different  data  storage  needs  •  Access  data  through  a  data  access  layer  hid  the  storage  

details  from  business  services  •  Add  complexiBes  for  programming,  deployment  and  

operaBon  support.  

Page 30: How$to$choose$a$NoSQL$solu1on · 2017-11-07 · Emergence$of$NoSQL$ • Dynamic$growth$of$datagenerated$ by$Users,$applicaons$and$sensors.$$$ • Structured,$SemiMStructured$and$

Ques1ons