The Evolution of Apache Kylin by Luke Han

29
The Evolution of Apache Kylin Luke Han | 韩卿 [email protected] 2016-05-09 Vancouver, Canada

Transcript of The Evolution of Apache Kylin by Luke Han

Page 1: The Evolution of Apache Kylin by Luke Han

The Evolution of Apache Kylin

LukeHan|韩卿[email protected]

2016-05-09Vancouver,Canada

Page 2: The Evolution of Apache Kylin by Luke Han

Aboutme…

§Luke Han|韩卿§ Co-creator&VPofApacheKylin

§ ASFMember

§ Co-founder&CEOatKyligenceInc

§ [email protected]

§ Twitter:@lukehq

Page 3: The Evolution of Apache Kylin by Luke Han

ApacheKylin

Page 4: The Evolution of Apache Kylin by Luke Han

Why

Happiness

Latency10s

Page 5: The Evolution of Apache Kylin by Luke Han

Whatwehavetried?

Kylin

Page 6: The Evolution of Apache Kylin by Luke Han

AboutApache Kylin

http://kylin.apache.org

Extreme OLAP Engine for Big Data

Apache Kylin is an open source Distributed Analytics Engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop, supporting extremely large datasets and sub-second level response time.

kylin /ˈkiːˈlɪn/麒麟--n.(inChineseart)amythicalanimalofcompositeform

Page 7: The Evolution of Apache Kylin by Luke Han

AboutApache Kylin

OLAP/数据集市

• BornforBigDataAnlytics

• Sub-secondsLatency

• ANSISQL

• SeamlessIntegration

withBITools

• Plug-ableArchitecture

Page 8: The Evolution of Apache Kylin by Luke Han

time, item

time, item, location

time, item, location, supplier

time item location supplier

time, location

Time, supplier

item, location

item, supplier

location, supplier

time, item, supplier

time, location, supplier

item, location, supplier

0-D(apex) cuboid

1-D cuboids

2-D cuboids

3-D cuboids

4-D(base) cuboid

• Base vs. aggregate cells; ancestor vs. descendant cells; parent vs. child cells1. (9/15, milk, Urbana, Dairy_land) - <time, item, location, supplier>2. (9/15, milk, Urbana, *) - <time, item, location>3. (*, milk, Urbana, *) - <item, location>4. (*, milk, Chicago, *) - <item, location>5. (*, milk, *, *) - <item>

• Cuboid = one combination of dimensions• Cube = all combination of dimensions

(all cuboids)

OLAPCube

Cube- BalanceBetweenSpaceandTime

Page 9: The Evolution of Apache Kylin by Luke Han

Architecture

MapReduce/Spark

Kylin

BITools,WebApp…

ANSISQL

Page 10: The Evolution of Apache Kylin by Luke Han

ApacheKylin Journey

GoLiveateBay&OpenSourceonGithub

ApacheIncubator

FirstApacheReleasev0.71

InfoWorld:BossieAwardBestOpenSourceBigDataTool

ApacheReleasev1.0

ApacheTopLevelProject

Sept2013 Oct2014 June2015 Nov2015

Nov2014 Sept2015

§ Kyligence founded

Mar2016

Projectkickoff

Page 11: The Evolution of Apache Kylin by Luke Han

Apache KylinGlobalAdoptions

Page 12: The Evolution of Apache Kylin by Luke Han

UseCase:JD.com

Page 13: The Evolution of Apache Kylin by Luke Han

UseCase:Baidu Map

Page 14: The Evolution of Apache Kylin by Luke Han

UseCase:NetEase

Page 15: The Evolution of Apache Kylin by Luke Han

PerformanceandThroughput

ByNetEase:http://www.bitstech.net/2016/01/04/kylin-olap/

Page 16: The Evolution of Apache Kylin by Luke Han

TheEvolution

Page 17: The Evolution of Apache Kylin by Luke Han

ApacheKylin NewFeatures

§ Plugin-ablearchitecture§NewMRCubeEnginewithfastcubing(1.5xfaster)§NewHBaseStoragewithparallelscan(2xfaster)§Nearreal-timeanalysis§Userdefinedaggregations§ Excel/PowerBI/Zeppelinintegration

Page 18: The Evolution of Apache Kylin by Luke Han

TheFreedom,Extensibility,Flexibility

§ Freedom

§ Zoobreak,notboundtoHadoopanymore

§ Freetogotoabetterengineorstorage

§ Extensibility

§ Acceptanyinput,e.g.Kafka

§ Embracenext-gendistributedplatform,e.g.Spark

§ Flexibility

§ Choosedifferentenginefordifferentdataset

Page 19: The Evolution of Apache Kylin by Luke Han

Newgenerationdesign

CubeBuilder(MapReduce…)

SQL

LowLatency-SecondsRouting

3rdPartyApp(WebApp,Mobile…)

Metadata

SQL-BasedTool(BITools:Tableau…)

QueryEngine

HadoopHive

RESTAPI JDBC/ODBC

Ø OnlineAnalysisDataFlowØ OfflineDataFlow

Ø Clients/Users interactive withKylinviaSQL

Ø OLAPCubeistransparent tousers

StarSchemaData KeyValueData

DataCubeOLAPCubes(HBase)

SQL

RESTServerDa

taSource

Abstraction Engine

Abstraction

Storage

Abstraction

Page 20: The Evolution of Apache Kylin by Luke Han

MREngineIN OUT

HiveSource

HBaseStorage

CubeMetadata

SourceFactory StorageFactoryEngineFactory

Plug-ablearchitecture

Page 21: The Evolution of Apache Kylin by Luke Han

Plug-ablearchitecture

MREngine

HiveAdapter HBase Adapter

loaddata savecubeHiveSource

HBaseStorage

adapttoIN adapttoOUT

Page 22: The Evolution of Apache Kylin by Luke Han

ParallelScan

§ Slowqueriesare5-10xfaster.

§ NewHbase storageenablespartitiononcuboidsthatarebigenough.

§ Overallquerytimeis2x faster thanbefore,sumresultsfrom10,000+queries.

Query

CuboidA

CuboidB

Query

A1 B1

A2 B2

A3 C

CuboidC

Server1

Server2

Server3

Server1

Server2

Server3

Page 23: The Evolution of Apache Kylin by Luke Han

NearRealtime IncrementalBuild

n Minutesmicrocubesn Kafkasourcen In-memcubingn Automerge

Page 24: The Evolution of Apache Kylin by Luke Han

UserDefinedAggregationTypes

§HyperLogLog CountDistinct§ TopN§ BitMap PreciseCountDistinct

§ fromSun,Yerui (meituan.com)

§ RawRecords§ fromWang,Xiaoyu (jd.com)

Page 25: The Evolution of Apache Kylin by Luke Han

Support more BI &VisualizationTools

§ SupportsTableau9.1§ SupportsMSExcel§ SupportsMSPowerBI§ SupportsZeppelin

Page 26: The Evolution of Apache Kylin by Luke Han

Roadmap

Page 27: The Evolution of Apache Kylin by Luke Han

ApacheKylinRoadmap

Page 28: The Evolution of Apache Kylin by Luke Han

2016Focus…

§ Streaming and Real Time§ Performance,performanceandperformance§ SupportmoreBI&visualizationtools§ SQL &OLAP Functions.

Page 29: The Evolution of Apache Kylin by Luke Han

Q&A

§More…§Website:http://kylin.apache.org§Twitter:@ApacheKylin

§ContactMe:§ [email protected]§@lukehq