ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312...
Transcript of ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312...
ClickHouseSummerMeetup
Agenda:7:00pm:ClickHouseintroduction-AlexanderZaitsev(Altinity)7:30pm:UsingClickHouseforexperimentationmetricsatSpotify-GlebKanterov(Spotify)8:20pm:DeepdiveintoClickHouseinternals-AlekseyMilovidov(Yandex)
July3,2018.Berlin
ClickHouseAnalyticalDBMSIntroduction
AlexanderZaitsev,Altinity
DeliveryHero,Berlin,3Jul2018
WhatIsClickHouse?
ClickHouseDBMSis• ColumnStore
• MPP
• SQL
• OpenSource
ClickHouseTimeline• DevelopedinYandexin2012-2015
• OpenSourcedJune2016
• Firstnon-YandexdeploymentsQ42016
• HundredsofcompaniesbyQ22018
WhyYetAnotherDBMS?
OpenSourceAnalytical
DBMS
CommercialAnalyticalDBMS
VerticaMemSQLActian
SnowFlakeRedShift
…
InfiniDB(MariaDBcs)
InfoBrightMonetDBGreenPlum
Spark…
OpenSourceAnalytical
DBMS
CommercialAnalyticalDBMS
OpenSourceAnalytical
DBMS
CommercialAnalyticalDBMS
ClickHouse
• Fast!
• Flexible!
• Free!
HowFast?
:)selectcount(*)fromdw.TSELECTcount(*)FROMdw.T┌───────count()─┐│1185063669477│└───────────────┘1rowsinset.Elapsed:4.361sec.Processed1.19trillionrows,1.19TB(271.73billionrows/s.,271.73GB/s.)
Query1 Query2 Query3 Query4 Setup0.034 0.061 0.178 0.498 MapD&2-nodep2.8xlargecluster0.051 0.146 0.047 0.794 kdb+/q&4IntelXeonPhi7210CPUs0.762 2.472 4.131 6.041 BrytlytDB1.0&2-nodep2.16xlargecluster1.034 3.058 5.354 12.748 ClickHouse,IntelCorei54670K1.56 1.25 2.25 2.97 Redshift,6-nodeds2.8xlargecluster2 2 1 3 BigQuery6.41 6.19 6.09 6.63 AmazonAthena8.1 18.18 n/a n/a Elasticsearch(heavilytuned)14.389 32.148 33.448 67.312 Vertica,IntelCorei54670K22 25 27 65 Spark2.3.0&singlei3.8xlargew/HDFS35 39 64 81 Presto,5-nodem3.xlargeclusterw/HDFS152 175 235 368 PostgreSQL9.5&cstore_fdw
“1.1BillionTaxiRidesBenchmarks”http://tech.marksblogg.com/benchmarks.html
Query1 Query2 Query3 Query4 Setup0.034 0.061 0.178 0.498 MapD&2-nodep2.8xlargecluster0.051 0.146 0.047 0.794 kdb+/q&4IntelXeonPhi7210CPUs-2.4153.5994.962ClickHouseatKodiakDataserver0.762 2.472 4.131 6.041 BrytlytDB1.0&2-nodep2.16xlargecluster1.034 3.058 5.354 12.748 ClickHouse,IntelCorei54670K1.56 1.25 2.25 2.97 Redshift,6-nodeds2.8xlargecluster2 2 1 3 BigQuery6.41 6.19 6.09 6.63 AmazonAthena8.1 18.18 n/a n/a Elasticsearch(heavilytuned)14.389 32.148 33.448 67.312 Vertica,IntelCorei54670K22 25 27 65 Spark2.3.0&singlei3.8xlargew/HDFS35 39 64 81 Presto,5-nodem3.xlargeclusterw/HDFS152 175 235 368 PostgreSQL9.5&cstore_fdw
“1.1BillionTaxiRidesBenchmarks”http://tech.marksblogg.com/benchmarks.html
ClickHouserunsat• Baremetal(anyLinux)
• Amazon
• Azure
• Kubernets,VMWareetc.
• KodiakDatacloud
0,01,02,03,04,05,06,07,08,09,010,0
ClickHouse Vertica
SingleTableSELECTGROUPBY
JOINtUSINGt_keyWHERE<t.smth>?
JOIN(SELECT*FROMtWHERE<smth>)USINGt_key
• 19queries,1200Mrowstable,3-nodeclusters
RealcompaniesareusingClickHousefor:• MobileAppandWebanalytics
• AdTechbiddinganalytics
• OperationalLogsanalytics
• DNSqueriesanalysis
• Stockcorrelationanalytics
• Telecom
• Securityaudit
• FintechSaaS
• Manufactoringprocesscontrol
• BlockChaintransactionsanalysis
Worldwide
*www.altinity.comvisitsin2018
Sizedoesnotmatter
• Yandex:500+servers,25Brec/day• LifeStreet:60servers,75Brec/day• CloudFlare:36servers,200Brec/day• Bloomberg:102servers,1000Brec/day
HappyMigrations!• FromMySQL/InfoBright/
PostreSQL/SparktoClickHouse
• FromVertica/RedShiftto
ClickHouse
SPEED!
COST!VENDORUN-LOCKING!
03.07 19:00 CLICKHOUSE GATE 2 boarding 03.07 19:30 CLICKHOUSE GATE 3 03.07 20:00 CLICKHOUSE GATE 4
FewCaseStudies
• AdTech(adexchange,adserver,RTB,DMPetc.)
• AdOptimization,programmaticbidding
• Alotofdata:– 10,000,000,000+events/day
• Alotofqueries:usersandalgorithms
UsedVertica,butneededtomove
• Datasizesconstantlygrow• EstimatedPBs• Verticalicensewouldbetooexpensive
… migrationwasnoteasy
*MoredetailsatOctober2017BerlinMeetup
MajorDesignDecisions• Dictionariesforstar-schemadesign
• ExtensiveuseofArrays
• SummingMergeTreeforrealtimeaggregation
• Smartquerygeneration
• Multipleshardsandreplicas
Results• Successfulmigration,1y+inproduction
• Betterperformanceandflexibility
– 75Brows/day
– 1Мrows/secinpeakhours
– 1.3MSQLqueries/day
• 30%hardwarecostreduction(lessexpensivestorage):
• Nolicensecostandlimits:
– 3PBofrawdata
– 6,000billionrows Poweredby:
Case2.FintechCompany
• StockSymbolsCorrelationAnalysis
• 5000Symbols
• 10yearsofdata
100Bdatapoints
Challenge• (time,symbol,price)–100billion• log_return=runningDifference(log(price))– 100billiontimes
• corr(s1,s2)=corr(log_return(s1),log_return(s2))
Foreverypair(s1,s2)from5000s(i),12.5Mpairsoverall
• Groupbyhours
Calculate12,500,000timesForeveryhour!
Veryslow
Tried…• Hadoop
• Spark
• GreenplumClickHouse
S1
S3
S2
timesymbolprice
timesymbollogReturn(price)
timegroupArray(symbol)groupArray(logRet..)
S1
S3
S2S2
R
R
R
S1
S3
2500tasks
date+hourcorr(S(i),S(j))
POCPerformanceResults• 3serverssetup
• 2years,5000symbols:
– log_returncalculations:~1h(distributed)– Convertingtoarrays:~1h(almostdistributed)
– Correlations:~50hours(alsodistributed)• 12,5M/50h=70/sec
Distributed=>itscaleseasily!
Case3.Ivinco
• Matureboardreadersystem
• Alotofdatacollectedfromdifferentsources
• Alotofoperationaldata(performancemonitoring)
200TBinMySQL!
Operationalproblems• Hardtoscale• HardtomakeHAsolution
• Performanceissues:
– ‘Manual’partitioningandsharding
– Dozensofindexespertableetc.
Organizationalproblems
• Nodevelopmentresourcestorewrite
• Minimalchangestocurrentsystemare
allowed
BinarylogreplicationfromMySQLtoClickHouse
MySQL
clickhouse-mysql
Queries
SourceData
Results
• SeamlessintegrationofClickHouseintothecurrentsystem
• Nodevelopers/codinginvolved,projectisdonewithDevOps
• Easytotestperformancesidebyside(ClickHouseis100timesfaster)
• Nowreadytore-writemainsystem
Moredetailsat:https://www.altinity.com/blog/2018/6/30/realtime-mysql-clickhouse-replication-in-practice
ClickHouseToday• MatureAnalyticDBMS.Provenbymanycompanies
• 2+yearsinOpenSource
• Transparentdevelopmentroadmap
• Manycommunitycontributors
• Emergingeco-system(tools,drivers,integrations)
• SupportandConsultingfromAltinity