High Performance Stream Processing

41
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ SPRINGONE2GX WASHINGTON, DC High Performance Stream Processing By Stephane Maldini, Glenn Renfro, David Turanski @smaldini, @cppwfs, @dturanski

Transcript of High Performance Stream Processing

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

SPRINGONE2GXWASHINGTON, DC

High Performance Stream ProcessingBy Stephane Maldini, Glenn Renfro, David Turanski

@smaldini, @cppwfs, @dturanski

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Performance as it pertains to:

Message flow

Serialization

Processing

2

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Message Flow: The Myth?

3

“1 million events a second?”

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Message Flow: The Myth?

4

“It Depends”

https://spring.io/blog/2015/06/17/spring-xd-benchmarks-part-1

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Message Flow: The Myth?

5

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Hardware

• Check Network speed • iperf

• Check Disk Read Write speed • dd

• Processor Speed (specs)

6

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Hardware: Network

7

1 Gb Ethernet

Msg Size Msgs/Sec

100 1,250,000

1,000 125,000

10,000 12,500

100,000 1,250

10 Gb Ethernet

Msg Size Msgs/Sec

100 12,500,000

1,000 1,250,000

10,000 125,000

100,000 12,500

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Hardware: Disk

• dd bs=1M count=256 if=/dev/zero of=test

• will just commit your 128 MB of data into a RAM buffer

• initially fast but server is still writing to disk after test

• dd bs=1M count=256 if=/dev/zero of=/tmp/testfile

conv=fdatasync

• This tells dd to require a complete “sync” once, right before it exits.

• Ensures all data is on the disk before calculating result

8

https://romanrm.net/dd-benchmark

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Message Size

9

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Message Size

10

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Transports

• Rabbit • perftest • https://www.rabbitmq.com/java-tools.html

• Kafka • ProducerPerformance • kafka-consumer-perf-test.sh • https://engineering.linkedin.com/kafka/benchmarking-apache-

kafka-2-million-writes-second-three-cheap-machines

11

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Configuration

• Kafka • BatchSize

o Default is 16384 o .0.8.2.1 vs .0.8.1.1

• Rabbit • prefetch

o Default for XD is 112

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Testing Tools

• Spring XD • load-generator-source • throughput

• https://github.com/spring-projects/spring-xd-modules

13

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Batching

• https://github.com/cppwfs/rabbitmqsink

• https://github.com/cppwfs/rabbitmqsource

14

Spring AMQP

Batch Size

Msgs Per Sec

1 18,465

10 158,564

100 453,926

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Network Hops

• Adds a cost per hop • Options:

• Direct Binding • Composed Modules • Custom Module

15

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

One last little thing

• JMX is disabled by default • When enabled it took a performance hit

because how SI was capturing stats via its exporters

16

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Object Serialization

POJO ⇔ byte[] required for transporting data between

remote processes • XD uses Kryo except when the payload type is

byte[] or String • XD supports optimizing Kryo for known payload

types

17

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Serialization Benchmarks

18

An excellent comparative JVM serializers benchmark: https://github.com/eishay/jvm-serializers/wiki

Best case: ~1500 ns

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Serialization Benchmarks

19

{"media": {"uri": "http://javaone.com/keynote.ogg", "title":"Javaone Keynote", "width":640,

"height":480, "format":"video/mpg4","duration":18000000,"size":58982400,"bitrate":

262144,"persons":["Bill Gates", "Steven Jobs"], "player":"JAVA","copyright":"" }, "images":[ {"uri": "http://javaone.com/keynote_large.jpg","title":"Javaone Keynote","width":

1024,"height":768,"size":"LARGE"}, { "uri": "http://javaone.com/keynote_small.jpg", "title":"Javaone

Keynote","width": 320,"height":240, "size":"SMALL"}] }

Domain Object (as JSON):

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Object Serialization

• Size matters: YMMV

• Manually optimized Kryo ser/deser ~ 1500 ns = 1.5 µs = .0015 ms.

• Kafka XD;1000B messages ~ 500,000 msg/sec

• 2000 ns per message

• Serialization overhead

• ~ 285174 msg/sec (source|sink)

• At 50,000 msg/sec, the overhead may still be significant (source|p1|p2|p3|sink)

20

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Optimizing Kryo in XD

• Disable references - If you know payload types do not contain cyclic references.

xd.codec.kryo.references=false (in servers.yml)

• This is a global setting for all streams • Register a custom serializer for a known

payload type • Install a jar with containing the required beans in xd/lib.

XD will auto-configure these21

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Custom Serializers in XD

22

public PojoCodec(java.util.List<KryoRegistrar> kryoRegistrars, boolean useReferences)

package spring.xd.bus.ext; ...@Configurationpublic class CustomKryoRegistrarConfig { @Bean public KryoRegistrar myCustomRegistration() { List<Registration> registrations = new ArrayList<>(); registrations.add(new Registration(MyObject.class, new MySerializer(),62)); return new KryoRegistrationRegistrar(registrations); }

XD scans this package for beans of type KryoRegistrar

Each Registration associates a type to a serializer and a unique ID

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Custom Serializers in XD

23

public class AddressSerializer extends Serializer<Address> { @Override public void write(Kryo kryo, Output output, Address address) { output.writeString(address.getStreet()); output.writeString(address.getCity()); output.writeString(address.getCountry()); } @Override public Address read(Kryo kryo, Input input, Class<Address> type) { return new Address(input.readString(),input.readString(),input.readString()); } }

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Serializable Domain Object

24

public class Address implements KryoSerializable { ... @Override public void write(Kryo kryo, Output output) { output.writeString(this.street); output.writeString(this.city); output.writeString(this.country); } @Override public void read(Kryo kryo, Input input) { this.street = input.readString(); this.city = input.readString(); this.country = input.readString(); }}

• (+) Simple: This works out of the box with no additional configuration

• (-) Requires access to source or wrapping

• (-) Internal benchmarks indicate

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Benchmarking Your Custom Serializers

25

Sample Results Ser (ns) Desr (ns)

Baseline 2710.5 4590.9

Serializable Domain Object

1663.3 2579.1

Custom Serializers 1126.1 2873.8

The spring-xd-samples repo includes a serialization-benchmarks project• https://github.com/spring-projects/spring-xd-samples/tree/master/

serialization-benchmarksLet’s look at some code…

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

What about Processing ?

• Source Msg/s > Sink Msg/s ? • Rate limited by Sink

• Blocking transformation (http, file…) ? • Rate limited by blocking Processor

• Polling Sources Pausing ? • Rate limited by small Prefetch properties

26

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Mitigating Cost of IO

• Negative impact ? • Scale Out ?

o Works up to a point o Network cost

• Scale Up ? o Message passing Overhead o More In-Flight Data

27

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Blocking IO

28

Remote

request A

request B

request C

Processor

Sink

Network Latency

Request Latency

Rate Degradation = -(A + B + C) ms

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

29

Remote

Processor

Sink

Network Latency

Request Latency

Asynchronous Boundary

Rate degradation = -(Async Hand Off) ms

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Asynchronous IO

• Mitigate temporarily slow processors/sink

• Back to degraded mode when queue full

• Async Hand-Off generates Garbage

30

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Reactor Core: Efficient Asynchronous

• Trade-off Memory vs Garbage generation • Pre-Allocated Ring Buffer

• Concurrent consuming without duplicating buffer content • Ring Buffer Consumer Sequences

31

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Ring What?

32

get and publish next available

slot

read published slot

Event Loop Thread

schedule Message<?> execution

execute Message<?>

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Reactor Core: Efficient Asynchronous

• The Spring XD Module • https://github.com/spring-projects/spring-xd/

blob/master/spring-xd-reactor

public  interface  Processor<I,  O>  {        Publisher<O>  process(Stream<I>  inputStream);  }

33

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

34

public  class  PongMessageProcessor  implements  Processor<Message,  Message>  {      @Override        public  Stream<Message>  process(Stream<Message>  inputStream)  {                return  inputStream.map(message  -­‐>                      new  GenericMessage<String>(message.getPayload()  +  "-­‐pojopong")                            );  

}  }  

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

35

Remote

Processor

Sink

Network Latency

Request Latency

Parallel Scatter Gather !

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Reactor Stream and RxJava

• Compose asynchronous results • Without blocking (unlike future.get())

• Reduce the processor/sink backlog !

36

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Scatter Gatherpublic  class  AsyncNetworkProcessor  implements  Processor<Message,  String>  {

     @Override        public  Observable<String>  process(Observable<Message>  inputStream)  {                return  inputStream.flatMap(message  -­‐>                      Observable.zip(  

postHttp(“/userProfile/”  +  message.getHeader(“user_id”)),    postHttp(“/userLocation/”  +  message.getHeader(“user_id”)),    (respA,  respB)  -­‐>  respA  +  “,”  +  respB  

 )                            );  

}  

   public  Observable<String>  postHttp(String  endpoint){  //  An  asynchronous  HTTP  call  to  forward  response  as  CSV  

}  }  

37

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Reactor Stream and RxJava

• Some operators help tuning the right packet size to send over the network

MicroBatching !

38

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

MicroBashingpublic  class  AsyncNetworkProcessor  implements  Processor<String,  String>  {

     @Override        public  Stream<String>  process(Stream<String>  inputStream)  {                return  inputStream.window(1000,  1,  TimeUnit.SECONDS)  

.flatMap(messages  -­‐>                        messages.reduce(“”,  (prev,  next)  -­‐>  prev  +  “,”  +  next  

 )                            );  

}  }  

39

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Questions

40

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 41

Microservices to Fast Data

John T. Davies

Learn More. Stay Connected.

@springcentral Spring.io/video