OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
-
Author
sang-won-park -
Category
Data & Analytics
-
view
287 -
download
2
Embed Size (px)
Transcript of OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
-
OLAPforBigData
freepsw2017.03
1freepsw
-
Hadoop
freepsw 2
(Tablescan)
TableJoin (Datashuffling )
granularity ( )
MapReducejob/Sparkjob ( job)
OLAP hadoop query
-
OLAP
freepsw 3
Event OLAPHadoop OLAP
OLAP ?
Druid,Kylin,Lens
(HDFS)
(YARN)
(MapReduce)
(SQLonHadoop)
NoSQL
NewSQL
OLAP
-
Druid vskylin vslens - 1
freepsw 4
HDFS
Druid
OLAP
stream
HDFS (Hivequery)
Kylin Lens
OLAP
Cube
HBASE
Cube
HiveDriverJDBCDriverESDriver
(, , )
segments
-
freepsw 5
()
SQL
FaultTolerance
BI
Druid Kylin Lens
X(json) X(SQLLike)
Node
x Tableau JDBC
Airbnb Ebay -
Druid vskylin vslens - 2
-
Druid
freepsw 6
-
Druid
freepsw 7
-
Druid ?
freepsw 8
(rollup)
(trillionsofevents,petabytes
ofdata)
(hdfs ?)
-
Druid
freepsw 9
3Timestamp,Dimension,measure
Timestamp: query time
Dimension : event string (, ), 4
Metric: . float
-
Druid (1) rollup
freepsw 10
Druid
, . . ( rawdata )
-
freepsw 11
Druid (2) Sharding theData
time sharding. sharding segment .
Druid Query segment scan.
-
freepsw 12
Druid (3) IndexingtheData
query .
Column
Query column
Column
Segment index
-
freepsw 13
Druid (4) LoadingtheData
query .
Real-time- .- Exactlyonce ( )
batch - Exactlyonce
realtime , batch .
-
Druid (5) QueryingtheData
freepsw 14
SQL , join .
Json overHTTP query
querylibrary (http://druid.io/libraries.html)
tablequery
- Druid join
-
Druid cluster
freepsw 15
Cluster
Indexingservice
Indexingservice
-
Druid cluster
freepsw 16
HistoricalNode- local segment query - Node segment (sharednothing)
BrokerNode - Client query - segment , .
node .
CoordinatorNode - Historicalnode segment - Segment // historicalnode
Indexingservice - Query columnformat - Bitmapindex segment
Real-timeProcessing - , indexing query .- segment historicalnode .
-
freepsw 17
Druid IndexingService
Indexingservice segment ,
1) Peon: task
2) MiddleManager:peon
3) Overload:middlemanager task
Peon middlemanager node .
-
freepsw 18
Druid Segmentandstorage1) indexing
(segment)
2) Segment Deepstorage
3) Historicalnode segment localdisk
4) Local segmentmemory
5) Queryservice
-
19
Druid Granularity
https://blog.codecentric.de/en/2016/08/realtime-fast-data-analytics-druid/
Granularity segment size
Segment
- Segmentfile - Day 1 segment- Minute 2 segment
Query
- Rollup - Minute 3 row - Disk/memory
freepsw
-
Druid MetadataStorage
freepsw 20
System (mysql,postgreSQL)
Derby storage, mysql orPostgreSQL
SegmentsTable- Segment - Coordinator
query segment
RuleTable- Coordinator segment
rule
Task-relatedTable- Indexingservice
-
Druid FaultTolerance
21
HistoricalNode- historicalnode deepstorage
CoordinatorNode - Hotfail-over - , ,
BrokerNode - hotfail-over
Deepstorage - . ( )
Real-timeProcessing
- - Checkpoint deepstorage - checkpoint
Node
freepsw
-
Druid ?
freepsw 22
Cacheathistoricalnodes
Cacheatbrokernodes
Pagecache (OS )
3 level memorycache
- trick .- dummysql
-
Druid
freepsw 23
-
Druid ?
freepsw 24
https://imply.io/
-
freepsw 25
Imply
Technicalarchitecture
-
freepsw 26
ImplyPivot
-
ImplyPivot
freepsw 27
-
Druid .
freepsw 28
docker singlenode
gitclonehttps://github.com/cimatech/druid-container.git docker-composeup docker ps
-
Druid
freepsw 29
-
Data loading .(File)
freepsw 30
Loading Schema , . curl-Ohttp://static.druid.io/artifacts/releases/druid-0.9.2-bin.tar.gz tarxzf druid-0.9.2-bin.tar.gz cddruid-0.9.2 curl -X'POST'-H'Content-Type:application/json'-d
@quickstart/wikiticker-index.json localhost:4000/druid/indexer/v1/task
-
Data query ..(1)json
freepsw 31
curl-L-H'Content-Type:application/json'[email protected]/wikiticker-top-pages.json http://localhost:8082/druid/v2/?pretty
-
freepsw 32
Data query ..(2)PlyQLSQL Query
https://github.com/implydata/plyql
http://plywood.imply.io/plyql
-
freepsw 33
Data loading .(Stream)
Stream sw
curl-XPOST-H'Content-Type:application/json'[email protected]://localhost:8200/v1/post/pageviews
1.
2.POST
-
freepsw 34
ApacheKylin
-
ApacheKylin
freepsw 35
E-bay 2014.10 Hadoop-SQL OLAP
Druid
-
Kylin
freepsw 36
OLAP hadoop query
-
37
Kylin
API job Hive , cube hbase
- Tomcat - Cube Memory (JVMGC )freepsw
-
Kylin Cube
freepsw 38
WebUI cube .
Project Project Model Hive Table .
Model Cube () .
Cube Dimension,measure .
-
Cube ?
freepsw 39
HIVE
Kylin
1)CubeTable
MapReduce
2)Cubedimension/measure
HBase
3)Dimensionmeasure
HDFS( )
4) cube
-
Cube job
freepsw 40
-
. ?
freepsw 41
10,000 2.5 ...
Kylin (1.76sec)
Hive(4.56sec)
-
. ?
freepsw 42
Kylin (0.12sec)
Hive(103sec)
10 850 . (Cardinality3)
-
. ?
freepsw 43
Kylin (0.18sec)
Hive(125sec)
10 690 . (Cardinality2,400)
-
Kylin
freepsw 44
dimension
cardinality cube
Cube .
Hive cube query
-
Kylin (Kyligence AnalyticsPlatform)
freepsw 45http://kyligence.io/en/
-
Kyligence
freepsw 46
Kafka/spark streaming cube
-
ApacheLens
freepsw 47
UnifiedOLAPonRealtime andBatchData(2014ApacheIncubator)
-
ApacheLens
freepsw 48
.
()
DataSilo
-
ApacheLens
freepsw 49
, query cost
-
HadoopEco Lens position
freepsw 50
queryEngine OLAP Layer
-
ApacheLens
freepsw 51
business
Github commit , .
.
2015 9 ApacheToplevel project
-
[backup]olap
freepsw 52
-
Kylin Cube
freepsw 53
dimension cuboid
-
RDBMS cube
freepsw 54
Cube rdbms aggregation
-
Apachekylin
freepsw 55
-
Apachekylin - CubeBuildjobflow
freepsw 56
-
Apachekylin
freepsw 57