Java8 Stream APIとApache SparkとAsakusa Frameworkの類似点・相違点

47
2015/11/28 ひしだま Java8 Stream APIとApache Sparkと Asakusa Frameworkの類似点・相違点 JJUG CCC 2015 Fall

Transcript of Java8 Stream APIとApache SparkとAsakusa Frameworkの類似点・相違点

  • 2015/11/28

    Java8 Stream APIApache SparkAsakusa Framework

    JJUG CCC 2015 Fall

  • 2

    lJJUGJava8 Stream API

    Java8 Stream APIApache SparkAsakusa FrameworkDAGAsakusa Framework

  • 3

    1.2.Stream APISparkAsakusaFW3.Stream APISparkAsakusaFW4.Stream APISparkAsakusaFW5.

  • 4

    Twitter ID@hishidama

    http://hiroba.dqx.jp/sc/character/1091135261820/

    AsakusaFWDQ10

  • 5

    2006 Apache HadoopNutchJava6

    2010 2 Hadoop2010 SparkOSS2011 3 Asakusa Framework2011 7 Spark2012 2 SIer2012 8 DQ102014 2 Apache Spark2014 3 Java8

  • 6

    StreamAPISparkAsakusaFW

  • 7

    Java8 Stream APIAPI

    Java SE 7/8

  • 8

    Apache Hadoop1/3lHDFSlMapReducelYARN

    1.2.MapReducejar

  • 9

    Apache Hadoop2/3

    DB

    app

    app

    Hadoop

    Hadoop

    app

    app

  • 10

    Apache Hadoop3/3Hadoopl Hadoop

    HadooplMapReduce

    l

  • 11

    Apache SparklRDDScalal

    lHDFS

    AMPLabDatabrickslhttps://databricks.com/spark/aboutlApache Spark

  • 12

    Asakusa FrameworklHadoopSpark

    l http://www.asakusafw.com/

  • 13

    StreamAPISparkAsakusaFW

  • 14

    Java8 Stream APIMyOperator operator = new MyOperator();

    Stream s0 = ;

    Stream s1 = s0.filter(operator::f);

    Stream s2 = s1.map(operator::m);

    List out1 = s2.collect(Collectors.toList());

  • 15

    Java8 Stream APIMyOperatorpublic class MyOperator {

    public boolean f(Data data) {

    return data.getValue() % 2 == 0;

    }

    public Data m(Data data) {

    return new Data(data.getValue() + 1);

    }

    }

  • 16

    Java8 Stream APIMyOperator operator = new MyOperator();

    Stream s0 = ;

    Stream s1 = s0.filter(operator::f);

    Stream s2 = s1.map(operator::m);

    List out1 = s2.collect(Collectors.toList());

    DAG

  • 17

    DAG1/2l ER

  • 18

    DAG2/2Directed Acyclic Graphll

  • 19

    Java8 Stream APIMyOperator operator = new MyOperator();

    Stream s0 = ;

    Stream s1 = s0.filter(operator::f);

    Stream s2 = s1.map(operator::m);

    List out1 = s2.collect(Collectors.toList());

    s0 filter

    f map m

    out1

  • 20

    Scalaval operator = new MyOperator

    val s0 : Stream[Data] =

    val s1 = s0.filter(operator.f)

    val s2 = s1.map(operator.m)

    val out1 = s2.toSeq

    s0 filter

    f map m

    out1

  • 21

    ScalaMyOperatorclass MyOperator {

    def f(data: Data) : Boolean =

    data.getValue() % 2 == 0

    def m(data: Data) : Data =

    Data(data.getValue() + 1)

    }

  • 22

    Apache Sparkval sc = new SparkContext()

    val operator = new MyOperator

    val s0 : RDD[Data] = sc.

    val s1 = s0.filter(operator.f)

    val s2 = s1.map(operator.m)

    s2.saveAsTextFile()

    MyOperatorScala

    s0 filter

    f map m

    out1

  • 23

    Asakusa FrameworkIn s0 = ; //

    Out out1 = ; //

    MyOperatorFactory operator = new MyOperatorFactory();

    Source s1 = operator.f(s0).out;

    Source s2 = operator.m(s1).out;

    out1.add(s2);

    s0 @Branch

    f @Update

    m out1

  • 24

    Asakusa FrameworkMyOperatorFactorypublic abstract class MyOperator {

    @Branch

    public Filter f(Data data) {

    return (data.getValue() % 2 == 0) ? Filter.OUT : Filter.MISSED;

    }

    @Update

    public void m(Data data) {

    data.setValue(data.getValue() + 1);

    }

    }

    MyOperatorFactory

  • 25

    DAG

    s0 filter

    f map m

    out1

    s0 @Branch

    f @Update

    m out1

  • 26

    StreamAPISparkAsakusaFW

  • 27

    1unionjoinzip

    s0

    out1

    s1

  • 28

    1 unionStream API

    Stream out = Stream.concat(Stream.concat(s0, s1), s2);

    Spark

    val out = s0 ++ s1 ++ s2

    AsakusaFW

    Source out = core.confluent(s0, s1, s2);

    1,abc

    2,def

    1,foo

    3,bar

    1,abc

    2,def

    1,foo

    3,bar

  • 29

    1 joinStream API

    Spark

    val out = s0.join(s1)

    AsakusaFW

    Source out = operator.join(s0, s1).joined; // @MasterJoin

    1,abc

    2,def

    1,foo

    3,bar

    1,abc,foo

  • 30

    1 cogroupStream API

    Spark

    val out = s0.cogroup(s1)

    AsakusaFW

    Source out = operator.group(s0, s1).out; // @CoGroup

    1,abc

    2,def

    1,foo

    3,bar

    2,def,null 1,abc,foo

    3,null,bar

  • 31

    1zip zipStream API

    Spark

    val out = s0.zip(s1)

    AsakusaFW

    1,abc

    2,def

    1,foo

    3,bar

    zip 2,def,3,bar 1,abc,1,foo

  • 32

    2duplicate

    s0

    2 out2

    1 out1

  • 33

    2duplicate duplicateStream API

    Spark

    val out1 = s0.map(operator.m1) val out2 = s0.map(operator.m2)

    AsakusaFW

    Source out1 = operator.m1(s0).out; Source out2 = operator.m2(s0).out;

  • 34

    3branch

    s0

    out2

    out1

  • 35

    3 branchStream API

    Spark

    AsakusaFW

    // @Branch Branch result = operator.branch(s0); Source out1 = result.out1; Source out2 = result.out2; Source out3 = result.out3;

  • 36

    Stream API ListStream Stream

    Scala

    Stream

    Spark Scala

    AsakusaFW HadoopSpark

    Hadoop, Spark

  • 37

    Asakusa Framework1/4Hadoop

    Hadoop

    Spark

  • 38

    Asakusa Framework2/4Asakusa FrameworkjarHadoopMapReduce

    Spark

  • 39

    Asakusa Framework3/4

    1. Hadoopl Hadoop180l 4550

    2. Hadoopl 1015

    3. Sparkl 34

  • 40

    Asakusa Framework3/4DAG

    @Convert

    @CoGroup

    @Summarize

    @CoGroup

    @MJoinUpdate

    1 255

    @MJoinUpdate

  • 41

    Asakusa Framework4/4

    Spark

    AsakusaFW

  • 42

    HadoopHDD

    HDD

  • 43

    HDDSSDHDD

    CPU100

    MRAM

  • 44

    Java8 Stream API

    parallel()

    Apache SparkexecutorSpark

    Asakusa Framework

  • 45

  • 46

    DAG

    AsakusaFW@Update@MasterJoinmapfilterStream APIStream APIAsakusaFW

    AsakusaFWSparkAsakusaFW

  • 47

    http://www.adventar.org/calendars/1166AsakusaFW

    DQ10