Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now...
date post
17-Mar-2018Category
Documents
view
219download
0
Embed Size (px)
Transcript of Building Multi-Petabyte Data Warehouses with ClickHouse · PDF file– Paraccel (now...
BuildingMulti-PetabyteDataWarehouseswithClickHouse
AlexanderZaitsev
LifeSteet,Altinity
PerconaLiveDublin,2017
Altinity
WhoamI
GraduatedMoscowStateUniversityin1999
Softwareengineersince1997
Developeddistributedsystemssince2002
Focusedonhighperformanceanalyticssince2007
DirectorofEngineeringinLifeStreet
Co-founderofAltinity
AdTechcompany(adexchange,adserver,RTB,DMPetc.)since2006
10,000,000,000+events/day
2K/event
3monthsretention(90-120days)
10B*2K*[90-120]=[1.8-2.4]PB
Tried/used/evaluated: MySQL(TokuDB,ShardQuery) InfiniDB
MonetDB InfoBrightEE Paraccel(nowRedShift)
Oracle Greenplum SnowflakeDB
Vertica
ClickHouse
Flashback:ClickHouseat08/2016
1-2monthsinOpenSource
InternalYandexproductnootherinstallations
Nosupport,roadmap,communicatedplans
3officialdevs
Anumberofvisiblelimitations(andmanyinvisible)
Storiesofotherdoomedopen-sourcedDBs
Developproductionsystemwiththat?
ClickHouseis/wasmissing:
Transactions Constraints Consistency UPDATE/DELETE NULLs(addedfewmonthsago) Milliseconds Implicittypeconversions StandardSQLsupport Partitioningbyanycolumn(dateonly) Enterpriseoperationtools
SQLdevelopersreaction:
Butwetriedandsucceeded
Beforeyougo:
Confirmyourusecase
Checkbenchmarks
Runyourown
Considerlimitations,notfeatures
MakeaPOC
Migrationproblem:basicthingsdonotfit
MainChallenges
EfficientschemaUseClickHousebests
Workaroundlimitations
Reliabledataingestion
Shardingandreplication
Clientinterfaces
LifeStreetUseCase
Publisher/Advertiserperformance
Campaign/Creativeperformanceprediction
Realtimealgorithmicbidding
DMP
LifeStreetRequirements
Load10Brows/day,500dimensions/row
Ad-hocreportson3monthsofdata
Lowdataandquerylatency
HighAvailability
Multi-DimensionalAnalysis
N-dimensionalcube
M-dimensionalprojection
slice
OLAPquery:aggregation+filter+groupby
Rangefilter
Queryresult
Disclaimer:averageslie
Typicalschema:star
Facts Dimensions Metrics Projections
StarSchemaApproach
De-normalized:dimensionsinafacttable
Normalized:dimensionkeysinafacttableseparatedimensiontables
Singletable,simple Multipletables
Simplequeries,nojoins Morecomplexquerieswithjoins
Datacannotbechanged Dataindimensiontablescanbechanged
Sub-efficientstorage Efficientstorage
Sub-efficientqueries Moreefficientqueries
Normalizedschema:traditionalapproach-joins
LimitedsupportinClickHouse(1level,cascadesub-selectsformultiple)
Dimensiontablesarenotupdatable
Dictionaries-ClickHousedimensionsapproach
Lookupservice:key->value
Supportsdifferentexternalsources(files,
databasesetc.)
Refreshable
Dictionaries.ExampleSELECT country_name, sum(imps) FROM T ANY INNER JOIN dim_geo USING (geo_key) GROUP BY country_name; vs SELECT dictGetString(dim_geo, country_name, geo_key) country_name, sum(imps) FROM T GROUP BY country_name;
Dictionaries.Configuration
...
...
...
...
...
Dictionaries.Sources file
mysqltable
clickhousetable
odbcdatasource
executablescript
httpservice
Dictionaries.Layouts
flat
hashed
cache
complex_key_hashed
range_hashed
Dictionaries.range_hashed
EffectiveDatedqueries
id
start_date
end_date
dictGetFloat32('srv_ad_serving_costs','ad_imps_cpm',toUInt64(0),event_day)
Dictionaries.Updatevalues Bytimer(default)
AutomaticforMySQLMyISAM
Usinginvalidate_query
Manuallytouchingconfigfile
Warning:Ndict*Mnodes=N*MDBconnections
Dictionaries.Restrictions
NormalkeysareonlyUInt64
Noondemandupdate(addedinSep2017
1.1.54289)
Everyclusternodehasitsowncopy
XMLconfig(DDLwouldbebetter)
Dictionariesvs.Tables
+NoJOINs
+Updatable
+Alwaysinmemoryforflat/hash(faster)
- Notapartoftheschema
- Somewhatinconvenientsyntax
Tables
Engines
Sharding/Distribution
Replication
Engine=?
Inmemory:
Memory
Buffer
Join
Set
Ondisk:
Log,TinyLog
MergeTreefamily
Interface: Distributed Merge Dictionary
Specialpurpose: View Materialized
View
Null
Mergetree Whatismerge
PKsortingandindex
Datepartitioning
Queryperformance
Block1 Block2
Mergedblock
PKindex
Seedetailsat:https://medium.com/@f1yegor/clickhouse-primary-keys-2cf2a45d7324
MergeTreefamily
ReplicatedReplacingCollapsingSummingAggergatingGraphite
MergeTree+ +
DataLoad
Multipleformatsaresupported,includingCSV,TSV,JSONs,nativebinary
Errorhandling SimpleTransformations
Loadlocally(better)ordistributed(possible)
Temptableshelp
Replicatedtableshelpwithde-dup
ThepowerofMaterializedViews
MVisatable,i.e.engine,replicationetc.
Updatedsynchronously
Summing/AggregatingMergeTreeconsistentaggregation
Altersareproblematic
DataLoadDiagram
Temptables(local)
Facttables(shard)
SummingMergeTree(shard)
SummingMergeTree(shard)
LogFiles
INSERT
MV MV
INSERT Buffertables(local)
Realtimeproducers
INSERT
Bufferflush
MySQL
Dictionaries
CLICKHOUSENODE
Updatesanddeletes
Dictionariesarerefreshable
ReplacingandCollapsingmergetrees
eventuallyupdates
SELECTFINAL
Partitions
ShardingandReplication ShardingandDistribution=>Performance FacttablesandMVsdistributedovermultipleshards
Dimensiontablesanddictsreplicatedateverynode(localjoinsandfilters)
Replication=>Reliability 2-3replicaspershard
CrossDC
DistributedQuerySELECTfooFROMdistributed_tableGROUPbycol1
Server1,2or3
SELECTfooFROMlocal_tableGROUPBYcol1
Server1
SELECTfooFROMlocal_tableGROUPBYcol1
Server2
SELECTfooFROMlocal_tableGROUPBYcol1
Server3
Replication Pertabletopologyconfiguration:
Dimensiontablesreplicatetoanynode Facttablesreplicatetomirrorreplica
Zookepertocommunicatethestate State:whatblocks/partstoreplicate
Asynchronous=>fasterandreliableenough
Synchronous=>slower
Isolatequerytoreplica Replicationqueues
SQL SupportsbasicSQLsyntax Non-standardJOINsimplementation:
1levelonly
ANYvsALL
onlyUSING Aliasingeverywhere
Arrayandnesteddatatypes,lambda-expressions,ARRAYJOIN
GLOBALIN,GLOBALJOIN
Approximatequeries
Someanalyticalfunctions
HardwareandDeployment
LoadisCPUintensive=>morecores
Queryisdiskintensive=>fasterdisks 10-12SATARAID10 SAS/SSD=>x2performanceforx2priceforx0.5capacity
10TB/serverseemsoptimal
ZookeperkeepinonDCforfastquorum RemoteDCworkbad(e.g.EastanWestcoastinUS)
MainChallengesRevisited
DesignefficientschemaUseClickHousebests
Workaroundlimitations
Designshardingandreplication
Reliabledataingestion
Clientinterfaces
Migrationprojecttimelines August2016:POC October2016:firsttestruns
December2016:productionscaledataload: 10-50Bevents/day,20TBdata/day 12x2serverswith12x4TBRAID10
March2017:ClientAPIready,startingmigration 30+clienttypes,20req/squeryload
May2017:extensionto20x3servers
June2017:migrationcompleted! 2-2.5PBuncompresseddata
Fewexamples
:)selectcount(*)fromdw.ad8_fact_eventwhereaccess_day=today()-1;SELECTcount(*)FROMdw.ad8_fact_eventWHEREaccess_day=(today()-1)count()75851067961rowsinset.Elapsed:0.503sec.Processed12.78billionrows,25.57GB(25.41billionrows/s.,50.82GB/s.)
:)selectdictGetString('dim_country','country_code',toUInt64(country_key))country_code,count(*)cntfromdw.ad8_fact_eventwhereaccess_day=today()-1groupbycountry_codeorderbycntdesclimit5;SELECTdictGetString('dim_country','country_code',toUInt64(country_key))AScountry_code,count(*)AScntFROMdw.ad8_fact_eventWHEREaccess_day=(today()-1)GROUPBYcountry_codeORDERBYcntDESCLIMIT5country_codecntUS2159011287MX448561730FR433144172GB352344184DE3364793745rowsinset.Elapsed:2.478sec.Processed12.78billionrows,55.91GB(5.16billionrows/s.,22.57GB/s.)
:)SELECTdictGetString('dim_country','country_code',toUInt64(country_key))AScountry_code,sum(cnt)AScntFROM(SELECTcountry_key,count(*)AScntFROMdw.ad8_fact_eventWHEREaccess_day=(today()-1)GROUPBYcountry_keyORDERBYcntDESCLIMIT5)GROUPBYcountry_codeORDERBYcntDESCcountry_codecntUS2159011287MX448561730FR433144172GB352344184DE3364793745rowsinset.Elapsed:1.471sec.Processed12.80billionrows,55.94GB(8.70billionrows/s.,38.02GB/s.)
:)SELECTcountDistinct(name)ASnum_cols,formatReadableSize(sum(data_compressed_bytes)ASc)AScomp,formatReadableSize(sum(data_uncompressed_bytes)ASr)ASraw,c/rAScomp_ratioFROMlf.columnsWHEREtable='ad8_fact_event_shard'num_colscomprawcomp_ratio308325.98TiB4.71PiB0.067576408347699441rowsinset.Elapsed:0.289sec.Processed281.46thousandrows,33.92MB(973.22thousandrows/s.,117.28MB/s.)
ClickHouseatfall2017
1+yearOpenSource 100+prodinstallsworldwide
Publicchangelogs,roadmap,and