Streamy, Pipy, Analyticy

Post on 18-Dec-2014

2.036 views 0 download

description

Node.js Streams & Pipes revised for analytics

Transcript of Streamy, Pipy, Analyticy

Copyright  Push  Technology  2012  

LNUG  London  

January  2013  

Copyright  Push  Technology  2012   Darach@PushTechnology.com  

About  me?  

•  Distributed  Systems  /  HPC  guy.    

•  Chief  Scien*st  :-­‐  at  Push  Technology  

•  Responds  to:  Guinness,  Whisky  

•  TwiOer:  @darachennis  

Copyright  Push  Technology  2012  

Streamy  Pipy  

Analy*cy  

Copyright  Push  Technology  2012  

EEP  +  ‘Streams  &  Pipes’=  CEP  

•  An  experiment  in  Embedded  Event  Processing  •  Sliding,  Tumbling,  Monotonic  and  Periodic  windows  •  Separate  ‘window’  definiYon  from  operaYon  •  Aggregate  funcYons.  Window  of  data  produces  scalar  result  

•  But?  No  filtering,  branching  or  combinators,  no  flows  …  

•  That’s  a  job  for  Streams  &  Pipes.  Let’s  add  that.  

eep.js:  Func*onal  Opera*ons  on  Streaming  Data  Windows    

S Cw ww w Q

Copyright  Push  Technology  2012  

Windows  

Copyright  Push  Technology  2012  

Windows  +  Aggregate  FuncYons  

•  A  window  of  data  is  a  slice  of  data  over  Yme,  number  of  events  or  some  other  dimension  

•  An  aggregate  funcYon  is  something  you  do  in  the  context  of  a  window.  

What  is  this?  •  Average  –    Aggregate  Func*on  •  CPU  –  Data  (events)  •  On  a  second  by  second  basis    -­‐  Periodic  Yme  window  

Example  

Copyright  Push  Technology  2012  

Tumbling  Windows  

•  Every  N  events,  give  me  an  average  of  the  last  N  events  •  Does  not  overlap  windows  •  ‘Closing’  a  window,  ‘Emits’  a  result  (the  average)  •  Closing  a  window,  Opens  a  new  window  

What  is  a  tumbling  window?  

1 2 3 4

2 3 4 5

2 3 4 5

t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 ...

init()

init()

init()

emit()

emit()

emit()

x() x() x() x()

x() x() x() x()

x() x() x() x()

Copyright  Push  Technology  2012  

Sliding  Windows  

•  Like  tumbling,  except  can  overlap.    •  But  typically  O(N2),  Keep  N  small.  Except  EEP.js.  O(N)  perf.  

•  Every  event  opens  a  new  window.  •  Ader  N  events,  every  subsequent  event  emits  a  result.  •  Like  all  windows,  cost  of  calculaYon  amorYzed  over  events  

What  is  a  sliding  window?  

1 2 3 4

1 2 3 4

1 2 3 ..

1 2 .. ..

5

..

..

..

..

..

init()

x()

x()

x()

..

.. ..

..

..

..

..

..

t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 ...

Copyright  Push  Technology  2012  

Periodic  Windows  

•  Driven  by  ‘wall  clock  Yme’  in  milliseconds  •  Not  monotonic,  natch.  Beware  of  NTP  

What  is  a  periodic  window?  

1 2 3 4

2 3 4 5

2 3 4 5

t0 t1 t2 t3 ...

init()

init()

init()

emit()

emit()

emit()

x() x() x() x()

x() x() x() x()

x() x() x() x()

Copyright  Push  Technology  2012  

Monotonic  Windows  

•  Driven  mad  by  ‘wall  clock  Yme’?  Need  a  logical  clock?  •  No  worries.  Provide  your  own  clock!  Eg.  Vector  clock  

What  is  a  monotonic  window?  

1 2 3 4

2 3 4 5

2 3 4 5

t0 t1 t2 t3 ...

init()

init()

init()

emit()

emit()

emit()

x() x() x() x()

x() x() x() x()

x() x() x() x()

my my my

Copyright  Push  Technology  2012  

Slide  beOer  with  CompensaYng  Aggregates  

1

1 2 3 4

1 2 3 4

1 2 3 ..

1 2 .. ..

5

..

..

..

..

..

init()

x()

x()

x()

..

.. ..

..

..

..

..

..

t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 ...

do { … } while (…)

compensate()

Copyright  Push  Technology  2012  

Bad  Sliding  -­‐  O(N2)  

Copyright  Push  Technology  2012  

Good  Sliding  

•  Takes  us  from  O(N2)  to  O(N)  for  Sliding  windows  

Copyright  Push  Technology  2012  

EEP.js  is  fast  

Copyright  Push  Technology  2012  

Using  Sliding,  Tumbling  Windows  

Copyright  Push  Technology  2012  

Using  Periodic,  Monotonic  Windows  

Copyright  Push  Technology  2012  

Custom  clocks  (noYon  of  Yme)  

Copyright  Push  Technology  2012  

EEP.js  v0.1,  v0.2  were  ugly  babies.  

Sorry!    Swear,  the  next  version  will  be  just  as  funcYonal  but  preOy…  

Copyright  Push  Technology  2012  

Streams  &  Pipes  

Copyright  Push  Technology  2012  

What  about  Streams  &  Pipes?  

S C Q

w ww weep

????

+

Copyright  Push  Technology  2012  

Streams  &  Pipes:  Origins  

•  Do  one  thing.  Do  it  well  •  Compose  sophisYcated  behaviors  from  simple  parts  

•  Maximize  reuse  •  Unix,  ‘Chain  of  Responsibility’  (GoF),  Interceptor  (POSA2),  XPipe,  Builder,  …  

•  The  ‘Assembly  Line  Principle’  is  nothing  new  

Copyright  Push  Technology  2012  

Streams  &  Pipes:  Node.JS  

•  var  events  =  require(‘events’)  •  Publish/Subscribe  to  event  (streams)  

•  var  stream  =  require(‘stream’)  •  Readable  –  Consume  a  (finite)  set  of  events  •  Writable  –  Produce  a  (finite)  set  of  events  •  readable.pipe(writeable)  •  writeable.pipe(readable)  

Copyright  Push  Technology  2012  

Streams  &  Pipes:  streams2  

•  Transform  –  Compress,  Encrypt,  Encode,  …  •  Duplex  –  Readable  and  Writable  •  Passthrough  –  The  canonical  ‘noop’  transform  

•  Node.js  Streams  history  (so  far)    hOp://bit.ly/XupqkO  -­‐  by  @izs  

Copyright  Push  Technology  2012  

Streams  &  Pipes:  but  …  

•  Oriented  for  IO,  not  compute/analy*cs  •  Array-­‐like  buffers  not  individual  datums  •  @dominictarr  event-­‐streams?  Array  based  •  ASCII,  UTF-­‐8,  Binary  -­‐  not  JS  types  •  Oden  require  copying,  parsing,  …  (slow)  

•  So,  streams  &  pipes  for  JS  types?  Yes!  •  Do  one  thing.  Do  it  well  •  Compose  sophisYcated  simple  parts  •  Maximize  reuse  

Copyright  Push  Technology  2012  

Introducing  Beam.js  

Copyright  Push  Technology  2012  

Beams,  Pipes  

•  Streams  &  Pipes  for  analyYcs  •  Not  designed  for  IO.  Use  Streams  for  that  

•  Not  concerned  with  CEP.    •  …  Use  EEP  for  that?  J  

•  Not  concerned  with  arrays  of  things  •  …  Use  Dominic  Tarr’s  event-­‐stream  for  that  

•  Beam  •  Crunch  events  •  Pipeline,  Branch  &  Combine  

Copyright  Push  Technology  2012  

Beams  &  Pipes.  

•  Streams  &  Pipes,  reconsidered  for  JS  types  

•  var  Beam  =  require(‘beam’);  

•  Beam.Source      -­‐-­‐  Push  data  in  •  Beam.Sink        -­‐-­‐  Suck  analysis  out  •  Beam.Operator  -­‐-­‐  OODA  /  PDCA  

•  Really  Simple:  ~150  LOC    

Copyright  Push  Technology  2012  

Beams  &  Pipes:  Operators  

•  Three  types  of  operator    •  Transform  •  1  in,  1  out.  Output  data/type  may  differ  

 •  Filter  •  1  in,  1  or  none  out.  Output  data/type  same  as  input  

 •  Custom  •  May  transform,  filter  

Copyright  Push  Technology  2012  

Example:  Defini*ons  

Copyright  Push  Technology  2012  

Example:  Usage  

Copyright  Push  Technology  2012  

Example:  Easy  to  debug  …  

Copyright  Push  Technology  2012  

Example:  Streams  &  Beams  

Copyright  Push  Technology  2012  

Branch  

•  You  can  define  1  or  many  •  They  can  overlap  or  not  as  you  see  fit  •  It’s  just  an  applicaYon  of  predicate  (boolean)  filters  •  Simple  

Copyright  Push  Technology  2012  

Combine?  

•  You  can  combine  many  sources  or  branches  into  one  •  Works  like  a  union.  First  in,  first  out.  •  You  can  write  your  own.  It’s  just  an  Operator  •  You  can  branch  from,  combine  to  …  any  beam  

Copyright  Push  Technology  2012  

Streams  &  Pipes,  ++  

•  In  Node.js  the  definiYon  and  usage  of  streams  in  a  pipe  are  entangled.  •  Typically,  with  Streams  &  Pipes  for  IO,  you  only  ever  want  one.  •  In  algorithms  you  may  want  to  reuse.  •  Think  about  it  …  

•  Event  EmiOer.    1  square  …    2  branches?  

Copyright  Push  Technology  2012  

Pipes  ++  

•  Beam  Pipes  are  different  (&  really  really  really  simple)  •  You  can  define  a  filter  once  •  You  can  store  it  in  a  module  •  Store  like  opera*ons  together  •  Make  libraries  

 •  Use  ‘em.  Share  ‘em.  

Copyright  Push  Technology  2012  

EEP  based  on  Beam  soon!  

Copyright  Push  Technology  2012  

Un*l  then?  

•  npm  install  beam  

• Filter  data  events  • Transform  data  events  • Analyze,  crunch  all  the  things  • Branch  all  the  things  • Combine  all  the  things  

Copyright  Push  Technology  2012  

Beam  futures?  

•  Taps  –  Convert  events  into  beams  • Drain  –  Convert  beams  into  events  • Beams  • Write  Beam  operators  in  ‘beam’  • Beams  ‘inside’  beams  • Source.pipe(op).compile();  //  Maybe?  

Copyright  Push  Technology  2012  

Ques*ons  

Copyright  Push  Technology  2012   Darach@PushTechnology.com  

QuesYons?  

•  Thank  you  for  listening  to,  having  me    •  Le  twiOer:  @darachennis  

 •  hOps://github.com/darach/beam-­‐js  

hOps://github.com/darach/eep-­‐js      

•  npm  install  eep  npm  install  beam  

•  EEP  built  on  beam?  EEP  in  other  langs?  Soon  

•  Fork  it,  Port  it,  Enjoy  it!