CrowdStrike: Real World DTCS For Operators

38
Real World DTCS For Operators

Transcript of CrowdStrike: Real World DTCS For Operators

Page 1: CrowdStrike: Real World DTCS For Operators

Real  World  DTCS  For  Operators

Page 2: CrowdStrike: Real World DTCS For Operators

An Introduction to CrowdStrike

We Are CyberSecurity Technology Company

We Detect, Prevent And Respond To All Attack Types In Real Time, Protecting Organizations From

Catastrophic Breaches

We Provide Next Generation Endpoint Protection, Threat Intelligence & Pre &Post IR Services

NEXT- GEN ENDPOINT

INCIDENTRESPONSE

THREATINTEL

Page 3: CrowdStrike: Real World DTCS For Operators

What  Is  Compaction?

• Cassandra  write  path:– First  the  Commitlog– Then  the  Memtable– Eventually  flushed  to  a  SSTable

• Each  SSTable is  written  exactly  once• Over  time,  Cassandra  combines  files

– Duplicate  cells  are  merged– Obsolete  data  is  purged

• The  algorithm  Cassandra  uses  to  determine  when  and  how  to  combine  files  is  pluggable,  and  choosing  the  right  strategy  may  be  important  at  scale

3©  2015.  All  Rights  Reserved.    

Page 4: CrowdStrike: Real World DTCS For Operators

What  Is  Compaction?

• SizeTieredCompactionStrategy– Each  time  min_threshold (4)  files  of  the  same  size  appear,  combine  them  into  a  new  file

– Over  time,  you’ll  naturally  end  up  with  a  distribution  of  old  data  in  large  files,  new  data  in  small  files

– Deleted  data  in  large  files  stays  on  disk  longer  than  desired  because  those  files  are  very  rarely  compacted

4©  2015.  All  Rights  Reserved.    

Page 5: CrowdStrike: Real World DTCS For Operators

SizeTieredCompactionStrategy

©  2015.  All  Rights  Reserved.     5

Page 6: CrowdStrike: Real World DTCS For Operators

SizeTieredCompactionStrategy

If  each  of  the  smallest  blocks  represent  1  day  of  data,  and  each  write  had  a  90  day  TTL,  when  do  you  actually  delete  files  and  reclaim  disk  

space?

©  2015.  All  Rights  Reserved.     6

Page 7: CrowdStrike: Real World DTCS For Operators

Why  Compaction  Strategy  Matters

©  2015.  All  Rights  Reserved.     7

• We  keep  some  data  from  sensors  for  a  fixed  time  period• Processes• DNS  queries• Files  created

• It’s  a  LOT  of  data• Talk  tomorrow  morning:  One  million  writes  per  second  with  60  nodes

• We’re  WELL  past  60  nodes• If  we  can’t  delete  it  efficiently,  costs  go  way,  way  up

Page 8: CrowdStrike: Real World DTCS For Operators

DateTieredCompactionStrategy

• Early  tickets  suggested  creating  a  way  to  stop  compacting  cold  data– CASSANDRA-­5515  – track  sstable coldness,  stop  compacting  cold  sstables (measured  by  READ  counts)

• CASSANDRA-­6602  – optimize  for  time  series  specifically– Solution  provided  by  Björn Hegerfors from  Spotify– Use  sstable’s min  timestamp  to  find  a  target  window– Compact  sstables within  the  same  target– Stop  compacting  sstables if  max  timestamp  is  older  than  a  specified  cutoff

©  2015.  All  Rights  Reserved.     8

Page 9: CrowdStrike: Real World DTCS For Operators

DTCS  In  Pictures

©  2015.  All  Rights  Reserved.     9

Page 10: CrowdStrike: Real World DTCS For Operators

DTCS  Parameters

• max_sstable_age_days• base_time_seconds• timestamp_resolution• Min_threshold

– Common  to  all  compaction  strategies

• Max  Threshold– Common  to  all  compaction  strategies

©  2015.  All  Rights  Reserved.     10

Page 11: CrowdStrike: Real World DTCS For Operators

DTCS  In  Pictures

©  2015.  All  Rights  Reserved.     11

Page 12: CrowdStrike: Real World DTCS For Operators

DTCS  BenefitsIn  Theory…  

• You  can  stop  data  compacting  at  a  point  you  choose!– max_sstable_age_days

• You  can  adjust  the  window  size  so  that  you  can  quickly  expire  data  when  it’s  approximately  the  size  you  want– It’s  not  immediately  intuitive,  but  you  CAN  calculate  it  (min_threshold and  base_time_seconds)

• We  know  cold  data  won’t  be  recompacted,  so  we  can  potentially  enable  cold  storage  directories  with  cheaper  disk  – CASSANDRA-­8460  – patch  available,  I  need  to  rebase

©  2015.  All  Rights  Reserved.     12

Page 13: CrowdStrike: Real World DTCS For Operators

Do  people  consider  DTCS  Production  Ready?  

• It  was  added  to  2.0  after  2.1  was  out.  Usually  this  means:– Trivial  and  low  risk,  or– Experimental  and  meant  for  advanced  users  only

©  2015.  All  Rights  Reserved.     13

Page 14: CrowdStrike: Real World DTCS For Operators

Do  people  consider  DTCS  Production  Ready?  

• It  was  added  to  2.0  after  2.1  was  out.  Usually  this  means:– Trivial  and  low  risk,  or– Experimental  and  meant  for  advanced  users  only– I  challenge  you  to  find  documentation  on  which  is  true  for  DTCS

©  2015.  All  Rights  Reserved.     14

Page 15: CrowdStrike: Real World DTCS For Operators

Do  people  consider  DTCS  Production  Ready?  

• It  was  added  to  2.0  after  2.1  was  out.  Usually  this  means:– Trivial  and  low  risk,  or– Experimental  and  meant  for  advanced  users  only– I  challenge  you  to  find  documentation  on  which  is  true  for  DTCS

• Spotify’s  intro  blog  notes  that  they  use  it  in  production• I’ve  been  told  by  a  project  committer  that  they  feel  DTCS  is  for  advanced  users  only,  but  I’ve  never  seen  any  public  facing  messaging  that  normal  users  should  avoid  it

• It  seems  so  easy,  what  could  possibly  go  wrong…

©  2015.  All  Rights  Reserved.     15

Page 16: CrowdStrike: Real World DTCS For Operators

DTCS  Caveats

• The  initial  blogs  give  us  some  insight  about  what  type  of  things  may  not  behave  as  intended– “But  something  that  works  against  the  efforts  of  the  strategy  is  writes  with  highly  out-­of-­order  timestamps”• How  much  is  “highly  out  of  order”?  

– “Consider  turning  off  read  repairs.  Anti-­entropy  repairs  and  hinted  handoff  don’t  incur  as  much  additional  work  for  DTCS  and  may  be  used  like  usual.”

©  2015.  All  Rights  Reserved.     16

Page 17: CrowdStrike: Real World DTCS For Operators

Out  of  order  timestamps

• When  an  sstable gets  flushed  with  an  old  timestamp  in  a  new  table:– The  max  timestamp  is  used  to  determine  when  to  stop  compacting,  but– The  min  timestamp  is  used  to  determine  which  other  files  will  be  compacted  with  this  sstable

©  2015.  All  Rights  Reserved.     17

Page 18: CrowdStrike: Real World DTCS For Operators

Out  of  order  timestamps

©  2015.  All  Rights  Reserved.     18

Page 19: CrowdStrike: Real World DTCS For Operators

Out  of  order  timestamps

©  2015.  All  Rights  Reserved.     19

Page 20: CrowdStrike: Real World DTCS For Operators

Out  of  order  timestamps

©  2015.  All  Rights  Reserved.     20

• Windows  are  tiered,  and  they  get  bigger  and  bigger  • With  default  settings  and  1  year  of  data,  the  largest  window  covers  180  days– This  means  even  if  most  of  the  file  is  past  max_sstable_age_days,  you  can  still  end  up  compacting  with  a  brand  new  sstable with  read  repaired  data

• “DTCS  never  stops  compacting”– Read  repairs  pull  old  data  into  new  windows  triggering  recompaction

Page 21: CrowdStrike: Real World DTCS For Operators

Out  of  order  timestamps

©  2015.  All  Rights  Reserved.     21

• Windows  are  tiered,  and  they  get  bigger  and  bigger  • With  default  settings  and  1  year  of  data,  the  largest  window  covers  180  days– This  means  even  if  most  of  the  file  is  past  max_sstable_age_days,  you  can  still  end  up  compacting  with  a  brand  new  sstable with  read  repaired  data

• “DTCS  never  stops  compacting”– Read  repairs  pull  old  data  into  new  windows  triggering  recompaction– Does  that  mean  we  better  run  repair?  

Page 22: CrowdStrike: Real World DTCS For Operators

Small  SSTables from  Repairs(and  other  streaming  operations)

• “If  an  SSTable contains  timestamps  that  don’t  match  the  time  when  it  was  actually  written  to  disk,  it  violates  the  size-­to-­age  correspondence  that  DTCS  tries  to  maintain.”

• The  suggestions  on  Spotify  and  Datastax blogs  say  run  repair  more  often  than  max_sstable_age_days,  but  that  isn’t  the  only  cause  of  small  sstables– Bootstrap– Decommission– Bulk  Loader

©  2015.  All  Rights  Reserved.     22

Page 23: CrowdStrike: Real World DTCS For Operators

Real  Pain:If  you  can’t  expand  your  cluster,  what’s  the  point?

©  2015.  All  Rights  Reserved.     23

SSTable Count  Per  Node

Page 24: CrowdStrike: Real World DTCS For Operators

Real  Pain:If  you  can’t  expand  your  cluster,  what’s  the  point?

©  2015.  All  Rights  Reserved.     24

Damn  you,  vnodes!

Page 25: CrowdStrike: Real World DTCS For Operators

Well…

©  2015.  All  Rights  Reserved.     25

Page 26: CrowdStrike: Real World DTCS For Operators

Small  SSTables Shouldn’t  Be  Ignored

• If  the  small  sstables are  beyond  max_sstable_age_days,  they  won’t  be  compacted– After  all,  that’s  the  point  of  max_sstable_age_days,  right?  

• If  you  raise  max_sstable_age_days,  the  ever-­growing  DTCS  tiered  windows  will  cause  existing  sstables to  merge  and  get  much  larger,  negating  one  of  the  benefits  of  DTCS

• If  you  don’t  raise  max_sstable_age_days,  you  have  to  deal  with  performance  implications  of  ten  thousand  sstables– Reduced  somewhat  by  CASSANDRA-­9882– Before  #9882,  too  many  sstables could  block  flushing  for  a  long  time

©  2015.  All  Rights  Reserved.     26

Page 27: CrowdStrike: Real World DTCS For Operators

Embarrassing  Admission

• Our  early  bulk  loading  plan  and  bootstrapping  procedure  acknowledged  that  sstables will  be  abandoned  beyond  max_sstable_age_days

• We  have  python  scripts  that  check  the  timestamps,  and  manually  submit  compactions  through  JMX  forceUserDefinedCompaction()

©  2015.  All  Rights  Reserved.     27

Page 28: CrowdStrike: Real World DTCS For Operators

Really  Embarrassing  Admission

• Our  early  bulk  loading  plan  and  bootstrapping  procedure  acknowledged  that  sstables will  be  abandoned  beyond  max_sstable_age_days

• We  have  python  scripts  that  check  the  timestamps,  and  manually  submit  compactions  through  JMX  forceUserDefinedCompaction()

• Yes,  really.

©  2015.  All  Rights  Reserved.     28

Page 29: CrowdStrike: Real World DTCS For Operators

Really  Embarrassing  Admission

• Our  early  bulk  loading  plan  and  bootstrapping  procedure  acknowledged  and  accepted  that  sstables will  be  abandoned  beyond  max_sstable_age_days

• We  have  python  scripts  that  check  the  timestamps,  and  manually  submit  compactions  through  JMX  forceUserDefinedCompaction()

• Yes,  really.• Does  it  actually  scale?

©  2015.  All  Rights  Reserved.     29

Page 30: CrowdStrike: Real World DTCS For Operators

When  should  you  use  DTCS?

• You  TTL  ALL  of  your  data  and  writes  come  in  order• Fixed  sized  cluster  and  no  plans  for  bulk  loading,  or  rarely  changing  cluster  size  and  not  using  vnodes– If  you  plan  on  growing,  you  better  have  a  plan  for  small  sstables– If  you  do  need  to  add/remove  nodes,  vnodes will  cause  far  more  small  sstables than  single-­token-­per-­node

• Extra  space  available  for  compaction– You  can’t  rely  on  theoretical  table  sizes  calculated  with  max_sstable_age_days,  because  read  repair,  hints,  etc,  can  force  those  files  to  span  much  larger  time  ranges  than  you  expect

©  2015.  All  Rights  Reserved.     30

Page 31: CrowdStrike: Real World DTCS For Operators

Being  Honest

©  2015.  All  Rights  Reserved.     31

Page 32: CrowdStrike: Real World DTCS For Operators

What  if?  

• Do  we  really  need  max_sstable_age_days?– The  conventional  logic  is  to  use  it  to  denote  cold  data,  but  we  use  it  to  force  window  sizes

– If  we  give  up  tiering,  and  stick  with  fixed  sized  windows,  do  we  need  max_sstable_age_days?

• Without  tiering,  can  we  swap  base_time_seconds for  more  intuitive  configuration  option?

©  2015.  All  Rights  Reserved.     32

Page 33: CrowdStrike: Real World DTCS For Operators

TimeWindowCompactionStrategy

• Designed  to  be  simple  and  efficient– Group  sstables into  logical  buckets– STCS  within  each  time  window– No  more  rolling  re-­compaction– No  more  streaming  leftovers– No  more  confusing  options,  just  Window  Size  +    Window  Unit

• “12  Hours”,  “3  Days”,  “6  Minutes”

©  2015.  All  Rights  Reserved.     33

Page 34: CrowdStrike: Real World DTCS For Operators

TimeWindowCompactionStrategy

• Submitted  to  Apache  Cassandra  as  CASSANDRA-­9666• For  now,  we  use  it  at  Crowdstrike to  clean  up  after  streaming:

– echo  "set  -­b  org.apache.cassandra.db:columnfamily=table,keyspace=keyspace,type=ColumnFamiliesCompactionStrategyClassorg.apache.cassandra.db.compaction.TimeWindowCompactionStrategy"    |    java  -­jar  jmxterm.jar -­l  $IP:$PORT

– It’s  not  an  accident  that  the  TWCS  defaults  use  1  day  windows  with  microsecond  timestamp  resolution,  that  matches  our  sstable needs,  but  we  think  it’s  a  good  default

• Patches  (and  Tests)  Available  for  2.1,  2.2,  3.0

©  2015.  All  Rights  Reserved.     34

Page 35: CrowdStrike: Real World DTCS For Operators

TimeWindowCompactionStrategy

• No  more  continuous  compaction• No  more  tiny  streaming  leftovers• No  more  confusing  options

– Just  Window  Size,  Window  Unit– “12  Hours”,  “3  Days”,  “6  Minutes”

• Work  is  ongoing  for  both  DTCS  and  TWCS– CASSANDRA-­9645  to  make  DTCS  easier  to  use– CASSANDRA-­10276  to  make  DTCS  do  STCS  within  each  window  (patch  available)

– CASSANDRA-­10280  to  make  DTCS  work  well  with  old  data  

©  2015.  All  Rights  Reserved.     35

Page 36: CrowdStrike: Real World DTCS For Operators

TimeWindowCompactionStrategy

• There’s  no  guarantee  that  TWCS  will  make  it  into  the  project– TWCS  is  certainly  easier  to  reason  about,  but  DTCS  was  there  first  and  is  already  deployed  by  real  users

– Anecdotal  evidence  and  preliminary  benchmarks  suggest  TWCS  comes  out  ahead  based  on  current  state  of  both  strategies  (at  the  time  of  these  slides)

– Formal  benchmarking  is  needed– DTCS  probably  wins  for  reads/SELECTS  in  SOME  data  models

• Even  if  TWCS  doesn’t  make  it  in,  the  source  is  available  now  on  (see:  CASSANDRA-­9666)– It’s  likely  we’ll  continue  to  maintain  it,  even  if  it’s  not  accepted  upstream,  so  pull  requests  are  welcome

©  2015.  All  Rights  Reserved.     36

Page 37: CrowdStrike: Real World DTCS For Operators

Q&A

• Talk  to  me  about  Cassandra  or  DTCS  on  twitter:  @jjirsa• Try  to  stop  me  from  talking  about  DTCS  on  IRC:  #cassandra• Crowdstrike is  awesome  and  hiring

– www.crowdstrike.com/careers/• Jim  Plush  and  Dennis  Opacki,  tomorrow  morning

– “1  Million  Writes  Per  Second  on  60  Nodes  with  Cassandra  and  EBS”

©  2015.  All  Rights  Reserved.     37

Page 38: CrowdStrike: Real World DTCS For Operators

Thank  you