Strem‡¦ç†(Spark Streaming +...

download Strem‡¦ç†(Spark Streaming + Kinesis)¨Offline‡¦ç†(Hive)®çµ±ˆ

of 15

  • date post

    18-Jan-2017
  • Category

    Technology

  • view

    1.388
  • download

    2

Embed Size (px)

Transcript of Strem‡¦ç†(Spark Streaming +...

  • Stream(Spark Streaming+Kinesis)Offline(Hive)

    () JAWS-UG Meguro #2

  • SmartNews

  • Spark Streaming

    Spark Streaming

  • (...)

    ( ...)

    ...

  • etc

    cloudsearch

    cloudsearch

    SmartNews Webmining http://www.slideshare.net/smartnews/smart-news-webmining

    Web

    crawler

    http://www.slideshare.net/smartnews/smart-news-webmining

  • metastore Presto/Hive/Spark EMR/

    HiveUDF SparkMLlib

    https://speakerdeck.com/takus/sumatoniyusufalseshi-jie-jin-chu-wozhi-erurogujie-xi-ji-pan-number-jawsdays-number-tech

    Hive Metastore in RDS () data in S3

    Presto Hive Spark

    https://speakerdeck.com/takus/sumatoniyusufalseshi-jie-jin-chu-wozhi-erurogujie-xi-ji-pan-number-jawsdays-number-tech

  • Spark SQLor

    read model

    API server

    API server

    DynamoDB(1)

    Kinesis

    S3

    Spark

    EMR

  • Spark Streaming

    DStream (Discrete Stream)RDD

    InputKinesis

    window

    Spark https://spark.apache.org/docs/latest/streaming-programming-guide.html#performance-tuning

    https://spark.apache.org/docs/latest/streaming-programming-guide.html#performance-tuning

  • : PV

  • : 1. Hive 2. Spark MLlibKMeans 3. Hive

  • join : (,)PV

  • Summary

    Kinesis+Spark StreamingSmartNews

    AB

  • :

    Kinesis + PipelineDB + Chartio PipelineDBProduction How SmartNews Utilizes PipelineDB http://developer.smartnews.com/blog/2015/09/09/20150907pipelinedb/

    http://developer.smartnews.com/blog/2015/09/09/20150907pipelinedb/

  • 2:

    WebMLNLP