The Evolution of Apache Kylin by Luke Han
Transcript of The Evolution of Apache Kylin by Luke Han
Aboutme…
§Luke Han|韩卿§ Co-creator&VPofApacheKylin
§ ASFMember
§ Co-founder&CEOatKyligenceInc
§ Twitter:@lukehq
ApacheKylin
Why
Happiness
Latency10s
Whatwehavetried?
Kylin
AboutApache Kylin
http://kylin.apache.org
Extreme OLAP Engine for Big Data
Apache Kylin is an open source Distributed Analytics Engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop, supporting extremely large datasets and sub-second level response time.
kylin /ˈkiːˈlɪn/麒麟--n.(inChineseart)amythicalanimalofcompositeform
AboutApache Kylin
OLAP/数据集市
• BornforBigDataAnlytics
• Sub-secondsLatency
• ANSISQL
• SeamlessIntegration
withBITools
• Plug-ableArchitecture
time, item
time, item, location
time, item, location, supplier
time item location supplier
time, location
Time, supplier
item, location
item, supplier
location, supplier
time, item, supplier
time, location, supplier
item, location, supplier
0-D(apex) cuboid
1-D cuboids
2-D cuboids
3-D cuboids
4-D(base) cuboid
• Base vs. aggregate cells; ancestor vs. descendant cells; parent vs. child cells1. (9/15, milk, Urbana, Dairy_land) - <time, item, location, supplier>2. (9/15, milk, Urbana, *) - <time, item, location>3. (*, milk, Urbana, *) - <item, location>4. (*, milk, Chicago, *) - <item, location>5. (*, milk, *, *) - <item>
• Cuboid = one combination of dimensions• Cube = all combination of dimensions
(all cuboids)
OLAPCube
Cube- BalanceBetweenSpaceandTime
Architecture
MapReduce/Spark
Kylin
BITools,WebApp…
ANSISQL
ApacheKylin Journey
GoLiveateBay&OpenSourceonGithub
ApacheIncubator
FirstApacheReleasev0.71
InfoWorld:BossieAwardBestOpenSourceBigDataTool
ApacheReleasev1.0
ApacheTopLevelProject
Sept2013 Oct2014 June2015 Nov2015
Nov2014 Sept2015
§ Kyligence founded
Mar2016
Projectkickoff
Apache KylinGlobalAdoptions
UseCase:JD.com
UseCase:Baidu Map
UseCase:NetEase
PerformanceandThroughput
ByNetEase:http://www.bitstech.net/2016/01/04/kylin-olap/
TheEvolution
ApacheKylin NewFeatures
§ Plugin-ablearchitecture§NewMRCubeEnginewithfastcubing(1.5xfaster)§NewHBaseStoragewithparallelscan(2xfaster)§Nearreal-timeanalysis§Userdefinedaggregations§ Excel/PowerBI/Zeppelinintegration
TheFreedom,Extensibility,Flexibility
§ Freedom
§ Zoobreak,notboundtoHadoopanymore
§ Freetogotoabetterengineorstorage
§ Extensibility
§ Acceptanyinput,e.g.Kafka
§ Embracenext-gendistributedplatform,e.g.Spark
§ Flexibility
§ Choosedifferentenginefordifferentdataset
Newgenerationdesign
CubeBuilder(MapReduce…)
SQL
LowLatency-SecondsRouting
3rdPartyApp(WebApp,Mobile…)
Metadata
SQL-BasedTool(BITools:Tableau…)
QueryEngine
HadoopHive
RESTAPI JDBC/ODBC
Ø OnlineAnalysisDataFlowØ OfflineDataFlow
Ø Clients/Users interactive withKylinviaSQL
Ø OLAPCubeistransparent tousers
StarSchemaData KeyValueData
DataCubeOLAPCubes(HBase)
SQL
RESTServerDa
taSource
Abstraction Engine
Abstraction
Storage
Abstraction
MREngineIN OUT
HiveSource
HBaseStorage
CubeMetadata
SourceFactory StorageFactoryEngineFactory
Plug-ablearchitecture
Plug-ablearchitecture
MREngine
HiveAdapter HBase Adapter
loaddata savecubeHiveSource
HBaseStorage
adapttoIN adapttoOUT
ParallelScan
§ Slowqueriesare5-10xfaster.
§ NewHbase storageenablespartitiononcuboidsthatarebigenough.
§ Overallquerytimeis2x faster thanbefore,sumresultsfrom10,000+queries.
Query
CuboidA
CuboidB
Query
A1 B1
A2 B2
A3 C
CuboidC
Server1
Server2
Server3
Server1
Server2
Server3
NearRealtime IncrementalBuild
n Minutesmicrocubesn Kafkasourcen In-memcubingn Automerge
UserDefinedAggregationTypes
§HyperLogLog CountDistinct§ TopN§ BitMap PreciseCountDistinct
§ fromSun,Yerui (meituan.com)
§ RawRecords§ fromWang,Xiaoyu (jd.com)
Support more BI &VisualizationTools
§ SupportsTableau9.1§ SupportsMSExcel§ SupportsMSPowerBI§ SupportsZeppelin
Roadmap
ApacheKylinRoadmap
2016Focus…
§ Streaming and Real Time§ Performance,performanceandperformance§ SupportmoreBI&visualizationtools§ SQL &OLAP Functions.
Q&A
§More…§Website:http://kylin.apache.org§Twitter:@ApacheKylin
§ContactMe:§ [email protected]§@lukehq