­»µ‚€¾½½°...

download ­»µ‚€¾½½° ¾¼¼µ€†¸: ¾‚ Hadoop  Spark Scala

of 17

  • date post

    11-Jul-2015
  • Category

    Technology

  • view

    1.960
  • download

    4

Embed Size (px)

Transcript of ­»µ‚€¾½½°...

  • - big data

    retailrocket Apache Spark

    - 12 2014,

  • - big data

    Retail Rocket

  • retailrocket

    1. .

    2. .

    3. API

    , email-, CRM -

    . .

    1

    2

    3

    big data

  • - big data

    Retail Rocket

  • retailrocket

    Retail Rocket?

    + 10 ,

    Ozon.ru Wikimart.ru

    10% 50% ( / )

    + email

    >10% (!)

    (CPO Revenue Share)

  • - big data * InSales 30.10.2014

  • - big data

    Retail Rocket

  • retailrocket

    CDH 5.1.2 Spark 1.1 High Availability: 2 Namenodes, 3 Journalnodes 18 Datanodes 100 Tb 100

  • retailrocket

  • retailrocket

    !!!

  • - big data

    Apache Spark

  • retailrocket

    - Scala ( 3-5 ) Spark SQL Hive Spark Streaming Parquet

  • retailrocket

    Yarn

    Old cluster

    Name nodes

    Journal nodes

    Name nodes

    Journal nodes

    New cluster

    CDH 4.5

    CDH 5.1 Yarn

    Puppet

    hzps://github.com/RetailRocket/puppet-cdh5

  • retailrocket

    Spark Scala

    (reduce) notebook Kryo

  • retailrocket

    - Hadoop: 1 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat Pig:

    pig.splitCombinaon = true pig.maxCombinedSplitSize

    Hive: hive.input.format =

    org.apache.hadoop.hive.ql.io.CombineHiveInputFormat hive.hadoop.supports.splizable.combineinpuormat=true

    Spark/Scala hzps://github.com/RetailRocket/SparkMulTool

    Retail Rocket: 100000 - 100000(), 3000()

  • retailrocket

    Learning on Spark (hzp://shop.oreilly.com/product/0636920028512.do) Coursera Scala (hzps://www.coursera.org/course/progfun) Spark Summit 2014 (hzp://spark-summit.org/2014) Spark should be bezer than MapReduce (if only it worked) Retail Rocket Public GitHub (hzps://github.com/RetailRocket)

  • retailrocket

    ?

    rzykov@retailrocket.ru