Couchbase 105: XDCR and Elasticsearch

34
Solu%ons Architect email: [email protected] Don Stacy Couchbase 105: XDCR and Elas%csearch

Transcript of Couchbase 105: XDCR and Elasticsearch

Solu%ons  Architect  

email:  [email protected]  

Don  Stacy  

Couchbase  105:    XDCR  and  Elas%csearch  

Cross  Data  Center  Replica0on  (XDCR)  

•  Deliver  highly  performant,  asynchronous  data  replica0on  

•  Provide  for  disaster  recovery  and  high  availability  •  Allow  for  varying  topologies  and  replica0on  schemes  

•  Support  data  locality  •  Support  load  separa0on  •  Reduce  opera0ons  effort  and  cost  

Intracluster  Replica0on  

Intercluster  Replica0on  

Key  Features  

• Replicates existing and modified data FROM source cluster • Compares document revisions before transfer -  De-duplicates writes to disk - Only the last version is written to disk and thus passed to XDCR

• Simplified Administration via console, REST, and CLI • Automatically handles node addition and removal • Automatically resumes after network disruption • Clusters may be different sizes and configurations -  Clusters must use the same platform (supported/tested)

Storage  to  XDCR  Replica0on  

Remote    Cluster  

XDCR:  Eventually  Consistent  

Remote    Cluster  

Time  2  

Time  3  

Time  1  

Bucket  A  

Bucket  B  

Bucket  C  

Cluster  1  

Bucket  A  

Bucket  B  

Bucket  C  

Cluster  2  

Cluster  Mechanics  Replica0on  between  clusters  

Topology  and  Use  Cases  

Uni-­‐Direc0onal  

•  Hot  spare  /  Disaster  recovery  •  Development/Tes0ng  copies  

•  Heavy  repor0ng  

•  Integrate  to  Elas0csearch  •  Integrate  to  custom  consumer  

Chain  

Data  aggrega0on  

Propaga0on  

Bi-­‐Direc0onal  (aka  Ac0ve/Ac0ve)  

•  Mul0ple  Ac0ve  Masters  

•  Disaster  Recovery  •  Datacenter  Locality  

replace(“key”, data)  replace(“key”, data)  (conflict)  

Bi-­‐Direc0onal  Conflicts  What  happens  when  you  write  the  same  key  in  mul0ple  clusters?  

Bi-­‐Direc0onal  Conflicts  Resolu0on  Strategy  

•  XDCR  is  eventually  consistent;  checks  document  metadata  to  resolve  conflicts:  1.  Numerical  sequence  (incremented  on  each  muta%on)    2.  CAS  value    3.  Expira%on  (TTL)  value  

•  All  clusters  will  pick  the  same  “winner”  

 {  …  }    

3  3  {  …  }  

Doc  1  on  DC1   Doc  1  on  DC2  

Winner  

Bi-­‐Direc0onal  Cau0on  

•  Avoid  upda0ng  the  same  document  in  mul0ple  clusters  with  bi-­‐direc0onal  XDCR  ­  Be  sure  to  understand  the  conflict  resolu%on  rules  

•  Best  Prac0ces  ­  Data  Center  s%ckiness  

•  Keep  users/transac%ons  isolated  to  a  DC  •  Only  redirect  to  another  DC  in  case  of  major  outage  

­  Use  separate  key  spaces  (e.g.,  DC  prefix)  to  avoid  conflicts  on  individual  documents.    Example:  

•  DC1::user:a9838-­‐s92-­‐s00  •  DC2::user:293ba-­‐293-­‐922  

Opera0onal  Details  

XDCR  Opera0ons  Configurable  Replica0on  

•  Documents  are  pushed  to  XDCR  a_er  disk  write  **  ­  Mul%ple  document  changes  are  combined  to  reduce  network  bandwidth  ­  32  parallel  data  streams  work  across  vBuckets  

•  Checkpoints  are  used  to  maintain  progress  per  vBucket  ­  Allows  for  stop/restart  without  transferring  all  data  ­  Op%mizes  data  exchange  ­  Transient  failures  cause  a  pause  and  restart  

•  Op0mis0c  XDCR  ­  Bandwidth  saving  feature  ­  <256  bytes  (default,  configurable)  sent  without  checking  first  ­  >=256  bytes:  des%na%on  checked  first  before  sending  

XDCR  Opera0ons  Advanced  Seangs  

Use  Version  2  (XMEM)  for  all  except  Elas0csearch  integra0on  

8-­‐256;  Increase  on  high-­‐end  HW  

60  to  14,400  (seconds);  Interval  between  checkpoints  

500-­‐1,000;  Doc  batching  count;  Increase  2-­‐3x  for  unidirec0onal  

10-­‐100,000  KB;  Doc  batching  size;  Increase  2-­‐3x  for  

unidirec0onal    

1-­‐300  seconds;  Lower  when  expec0ng  network  failures  

0  to  2,097,152  Bytes  (20MB)  compressed;  Op0mis0c  below  

this  size  

XDCR Ports-Connections  

Retrieve  Target  Cluster  Map  &  Configura0on  

 Data  Replica0on  

Mode   Required  Open  Ports  Non-­‐SSL   8091,  8092  SSL  w/  Version2/XMEM   11214,  11215,  18092  SSL  w/  Version1/CAPI   11214,  11215,  18091  

Replication Follows Cluster Map  

 Different  Configura0on  OK  

XDCR  Planning  Impact  on  Cluster  Sizing  

Your  clusters  need  to  be  sized  for  XDCR  

•  XDCR  is  CPU  intensive    ­  Configure  the  number  of  parallel  streams  based  on  your  CPU  capacity  

(xdcrMaxConcurrentReps)  

•  You  are  doubling  your  I/O  usage  ­  I/O  capacity  needs  to  be  sized  correctly  

•  Network  bandwidth  will  likely  increase  •  You  will  need  more  memory,  par0cularly  if  bi-­‐direc0onal  ­  Memory  capacity  needs  to  be  sized  correctly  

XDCR  Planning  Cloud/Internet  Considera0ons  (EC2,  internet  WAN,  etc.)  

•  Make  sure  ports  are  open  and  IP’s  are  reachable  on  both  sides  

•  Use  the  public  IP  address  in  the  des0na0on  cluster  •  Use  a  DNS  to  avoid  individual  IP  address  changes  •  Traffic  is  not  encrypted  prior  to  Couchbase  2.5  ­  Leverage  op%onal  SSL  in  2.5+  

3  3   2  

Document  Lifecycle  Review  Bringing  the  webinars  together  2  

Managed  Cache  

Disk  Que

ue  

Disk  

Replica%on  Queue  

App  Server  

Couchbase  Server  Node  

Doc  1  Doc  1  

Doc  1  

To  other  node  

XDCR  Queue  

Doc  1  

To  other  cluster  or  Elas%csearch    

View  engine    

Doc  1  

Interface  Demo  

Elas0csearch  

Integrated via XDCR  

Architecture  

Using Elastic Search  

Application Workflow  

Applica%on  

Search Query  result  set  

Retrieve Docs  docs  

Addi0onal  Resources  

•  Elas%csearch  Install  &  Configure  Gist  •  Elas%csearch  Transport  Plugin  Github  Repo  •  Elas%csearch  Head  Plugin  Github  Repo  •  Couchbase  /  Elas%c  Search  Docs  

Interface  Demo  

Q&A  

email:  [email protected]