OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)

of 57 /57
OLAP for Big Data freepsw 2017. 03 1 freepsw

Embed Size (px)

Transcript of OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)

  • OLAPforBigData

    freepsw2017.03

    1freepsw

  • Hadoop

    freepsw 2

    (Tablescan)

    TableJoin (Datashuffling )

    granularity ( )

    MapReducejob/Sparkjob ( job)

    OLAP hadoop query

  • OLAP

    freepsw 3

    Event OLAPHadoop OLAP

    OLAP ?

    Druid,Kylin,Lens

    (HDFS)

    (YARN)

    (MapReduce)

    (SQLonHadoop)

    NoSQL

    NewSQL

    OLAP

  • Druid vskylin vslens - 1

    freepsw 4

    HDFS

    Druid

    OLAP

    stream

    HDFS (Hivequery)

    Kylin Lens

    OLAP

    Cube

    HBASE

    Cube

    HiveDriverJDBCDriverESDriver

    (, , )

    segments

  • freepsw 5

    ()

    SQL

    FaultTolerance

    BI

    Druid Kylin Lens

    X(json) X(SQLLike)

    Node

    x Tableau JDBC

    Airbnb Ebay -

    Druid vskylin vslens - 2

  • Druid

    freepsw 6

  • Druid

    freepsw 7

  • Druid ?

    freepsw 8

    (rollup)

    (trillionsofevents,petabytes

    ofdata)

    (hdfs ?)

  • Druid

    freepsw 9

    3Timestamp,Dimension,measure

    Timestamp: query time

    Dimension : event string (, ), 4

    Metric: . float

  • Druid (1) rollup

    freepsw 10

    Druid

    , . . ( rawdata )

  • freepsw 11

    Druid (2) Sharding theData

    time sharding. sharding segment .

    Druid Query segment scan.

  • freepsw 12

    Druid (3) IndexingtheData

    query .

    Column

    Query column

    Column

    Segment index

  • freepsw 13

    Druid (4) LoadingtheData

    query .

    Real-time- .- Exactlyonce ( )

    batch - Exactlyonce

    realtime , batch .

  • Druid (5) QueryingtheData

    freepsw 14

    SQL , join .

    Json overHTTP query

    querylibrary (http://druid.io/libraries.html)

    tablequery

    - Druid join

  • Druid cluster

    freepsw 15

    Cluster

    Indexingservice

    Indexingservice

  • Druid cluster

    freepsw 16

    HistoricalNode- local segment query - Node segment (sharednothing)

    BrokerNode - Client query - segment , .

    node .

    CoordinatorNode - Historicalnode segment - Segment // historicalnode

    Indexingservice - Query columnformat - Bitmapindex segment

    Real-timeProcessing - , indexing query .- segment historicalnode .

  • freepsw 17

    Druid IndexingService

    Indexingservice segment ,

    1) Peon: task

    2) MiddleManager:peon

    3) Overload:middlemanager task

    Peon middlemanager node .

  • freepsw 18

    Druid Segmentandstorage1) indexing

    (segment)

    2) Segment Deepstorage

    3) Historicalnode segment localdisk

    4) Local segmentmemory

    5) Queryservice

  • 19

    Druid Granularity

    https://blog.codecentric.de/en/2016/08/realtime-fast-data-analytics-druid/

    Granularity segment size

    Segment

    - Segmentfile - Day 1 segment- Minute 2 segment

    Query

    - Rollup - Minute 3 row - Disk/memory

    freepsw

  • Druid MetadataStorage

    freepsw 20

    System (mysql,postgreSQL)

    Derby storage, mysql orPostgreSQL

    SegmentsTable- Segment - Coordinator

    query segment

    RuleTable- Coordinator segment

    rule

    Task-relatedTable- Indexingservice

  • Druid FaultTolerance

    21

    HistoricalNode- historicalnode deepstorage

    CoordinatorNode - Hotfail-over - , ,

    BrokerNode - hotfail-over

    Deepstorage - . ( )

    Real-timeProcessing

    - - Checkpoint deepstorage - checkpoint

    Node

    freepsw

  • Druid ?

    freepsw 22

    Cacheathistoricalnodes

    Cacheatbrokernodes

    Pagecache (OS )

    3 level memorycache

    - trick .- dummysql

  • Druid

    freepsw 23

  • Druid ?

    freepsw 24

    https://imply.io/

  • freepsw 25

    Imply

    Technicalarchitecture

  • freepsw 26

    ImplyPivot

  • ImplyPivot

    freepsw 27

  • Druid .

    freepsw 28

    docker singlenode

    gitclonehttps://github.com/cimatech/druid-container.git docker-composeup docker ps

  • Druid

    freepsw 29

  • Data loading .(File)

    freepsw 30

    Loading Schema , . curl-Ohttp://static.druid.io/artifacts/releases/druid-0.9.2-bin.tar.gz tarxzf druid-0.9.2-bin.tar.gz cddruid-0.9.2 curl -X'POST'-H'Content-Type:application/json'-d

    @quickstart/wikiticker-index.json localhost:4000/druid/indexer/v1/task

  • Data query ..(1)json

    freepsw 31

    curl-L-H'Content-Type:application/json'[email protected]/wikiticker-top-pages.json http://localhost:8082/druid/v2/?pretty

  • freepsw 32

    Data query ..(2)PlyQLSQL Query

    https://github.com/implydata/plyql

    http://plywood.imply.io/plyql

  • freepsw 33

    Data loading .(Stream)

    Stream sw

    curl-XPOST-H'Content-Type:application/json'[email protected]://localhost:8200/v1/post/pageviews

    1.

    2.POST

  • freepsw 34

    ApacheKylin

  • ApacheKylin

    freepsw 35

    E-bay 2014.10 Hadoop-SQL OLAP

    Druid

  • Kylin

    freepsw 36

    OLAP hadoop query

  • 37

    Kylin

    API job Hive , cube hbase

    - Tomcat - Cube Memory (JVMGC )freepsw

  • Kylin Cube

    freepsw 38

    WebUI cube .

    Project Project Model Hive Table .

    Model Cube () .

    Cube Dimension,measure .

  • Cube ?

    freepsw 39

    HIVE

    Kylin

    1)CubeTable

    MapReduce

    2)Cubedimension/measure

    HBase

    3)Dimensionmeasure

    HDFS( )

    4) cube

  • Cube job

    freepsw 40

  • . ?

    freepsw 41

    10,000 2.5 ...

    Kylin (1.76sec)

    Hive(4.56sec)

  • . ?

    freepsw 42

    Kylin (0.12sec)

    Hive(103sec)

    10 850 . (Cardinality3)

  • . ?

    freepsw 43

    Kylin (0.18sec)

    Hive(125sec)

    10 690 . (Cardinality2,400)

  • Kylin

    freepsw 44

    dimension

    cardinality cube

    Cube .

    Hive cube query

  • Kylin (Kyligence AnalyticsPlatform)

    freepsw 45http://kyligence.io/en/

  • Kyligence

    freepsw 46

    Kafka/spark streaming cube

  • ApacheLens

    freepsw 47

    UnifiedOLAPonRealtime andBatchData(2014ApacheIncubator)

  • ApacheLens

    freepsw 48

    .

    ()

    DataSilo

  • ApacheLens

    freepsw 49

    , query cost

  • HadoopEco Lens position

    freepsw 50

    queryEngine OLAP Layer

  • ApacheLens

    freepsw 51

    business

    Github commit , .

    .

    2015 9 ApacheToplevel project

  • [backup]olap

    freepsw 52

  • Kylin Cube

    freepsw 53

    dimension cuboid

  • RDBMS cube

    freepsw 54

    Cube rdbms aggregation

  • Apachekylin

    freepsw 55

  • Apachekylin - CubeBuildjobflow

    freepsw 56

  • Apachekylin

    freepsw 57