ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312...

41
ClickHouse Summer Meetup Agenda: 7:00pm: ClickHouse introduction - Alexander Zaitsev (Altinity) 7:30pm: Using ClickHouse for experimentation metrics at Spotify - Gleb Kanterov (Spotify) 8:20pm: Deep dive into ClickHouse internals - Aleksey Milovidov (Yandex) July 3, 2018. Berlin

Transcript of ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312...

Page 1: ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS

ClickHouseSummerMeetup

Agenda:7:00pm:ClickHouseintroduction-AlexanderZaitsev(Altinity)7:30pm:UsingClickHouseforexperimentationmetricsatSpotify-GlebKanterov(Spotify)8:20pm:DeepdiveintoClickHouseinternals-AlekseyMilovidov(Yandex)

July3,2018.Berlin

Page 2: ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS

ClickHouseAnalyticalDBMSIntroduction

AlexanderZaitsev,Altinity

DeliveryHero,Berlin,3Jul2018

Page 3: ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS

WhatIsClickHouse?

Page 4: ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS

ClickHouseDBMSis•  ColumnStore

• MPP

•  SQL

•  OpenSource

Page 5: ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS

ClickHouseTimeline•  DevelopedinYandexin2012-2015

•  OpenSourcedJune2016

•  Firstnon-YandexdeploymentsQ42016

•  HundredsofcompaniesbyQ22018

Page 6: ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS

WhyYetAnotherDBMS?

Page 7: ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS

OpenSourceAnalytical

DBMS

CommercialAnalyticalDBMS

VerticaMemSQLActian

SnowFlakeRedShift

InfiniDB(MariaDBcs)

InfoBrightMonetDBGreenPlum

Spark…

Page 8: ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS

OpenSourceAnalytical

DBMS

CommercialAnalyticalDBMS

Page 9: ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS

OpenSourceAnalytical

DBMS

CommercialAnalyticalDBMS

Page 10: ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS

ClickHouse

•  Fast!

•  Flexible!

•  Free!

Page 11: ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS

HowFast?

Page 12: ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS

:)selectcount(*)fromdw.TSELECTcount(*)FROMdw.T┌───────count()─┐│1185063669477│└───────────────┘1rowsinset.Elapsed:4.361sec.Processed1.19trillionrows,1.19TB(271.73billionrows/s.,271.73GB/s.)

Page 13: ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS

Query1 Query2 Query3 Query4 Setup0.034 0.061 0.178 0.498 MapD&2-nodep2.8xlargecluster0.051 0.146 0.047 0.794 kdb+/q&4IntelXeonPhi7210CPUs0.762 2.472 4.131 6.041 BrytlytDB1.0&2-nodep2.16xlargecluster1.034 3.058 5.354 12.748 ClickHouse,IntelCorei54670K1.56 1.25 2.25 2.97 Redshift,6-nodeds2.8xlargecluster2 2 1 3 BigQuery6.41 6.19 6.09 6.63 AmazonAthena8.1 18.18 n/a n/a Elasticsearch(heavilytuned)14.389 32.148 33.448 67.312 Vertica,IntelCorei54670K22 25 27 65 Spark2.3.0&singlei3.8xlargew/HDFS35 39 64 81 Presto,5-nodem3.xlargeclusterw/HDFS152 175 235 368 PostgreSQL9.5&cstore_fdw

“1.1BillionTaxiRidesBenchmarks”http://tech.marksblogg.com/benchmarks.html

Page 14: ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS

Query1 Query2 Query3 Query4 Setup0.034 0.061 0.178 0.498 MapD&2-nodep2.8xlargecluster0.051 0.146 0.047 0.794 kdb+/q&4IntelXeonPhi7210CPUs-2.4153.5994.962ClickHouseatKodiakDataserver0.762 2.472 4.131 6.041 BrytlytDB1.0&2-nodep2.16xlargecluster1.034 3.058 5.354 12.748 ClickHouse,IntelCorei54670K1.56 1.25 2.25 2.97 Redshift,6-nodeds2.8xlargecluster2 2 1 3 BigQuery6.41 6.19 6.09 6.63 AmazonAthena8.1 18.18 n/a n/a Elasticsearch(heavilytuned)14.389 32.148 33.448 67.312 Vertica,IntelCorei54670K22 25 27 65 Spark2.3.0&singlei3.8xlargew/HDFS35 39 64 81 Presto,5-nodem3.xlargeclusterw/HDFS152 175 235 368 PostgreSQL9.5&cstore_fdw

“1.1BillionTaxiRidesBenchmarks”http://tech.marksblogg.com/benchmarks.html

Page 15: ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS

ClickHouserunsat•  Baremetal(anyLinux)

•  Amazon

•  Azure

•  Kubernets,VMWareetc.

•  KodiakDatacloud

Page 16: ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS
Page 17: ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS

0,01,02,03,04,05,06,07,08,09,010,0

ClickHouse Vertica

SingleTableSELECTGROUPBY

JOINtUSINGt_keyWHERE<t.smth>?

JOIN(SELECT*FROMtWHERE<smth>)USINGt_key

•  19queries,1200Mrowstable,3-nodeclusters

Page 18: ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS

RealcompaniesareusingClickHousefor:•  MobileAppandWebanalytics

•  AdTechbiddinganalytics

•  OperationalLogsanalytics

•  DNSqueriesanalysis

•  Stockcorrelationanalytics

•  Telecom

•  Securityaudit

•  FintechSaaS

•  Manufactoringprocesscontrol

•  BlockChaintransactionsanalysis

Page 19: ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS

Worldwide

*www.altinity.comvisitsin2018

Page 20: ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS

Sizedoesnotmatter

•  Yandex:500+servers,25Brec/day•  LifeStreet:60servers,75Brec/day•  CloudFlare:36servers,200Brec/day•  Bloomberg:102servers,1000Brec/day

Page 21: ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS

HappyMigrations!•  FromMySQL/InfoBright/

PostreSQL/SparktoClickHouse

•  FromVertica/RedShiftto

ClickHouse

SPEED!

COST!VENDORUN-LOCKING!

03.07 19:00 CLICKHOUSE GATE 2 boarding 03.07 19:30 CLICKHOUSE GATE 3 03.07 20:00 CLICKHOUSE GATE 4

Page 22: ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS

FewCaseStudies

Page 23: ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS

•  AdTech(adexchange,adserver,RTB,DMPetc.)

•  AdOptimization,programmaticbidding

•  Alotofdata:–  10,000,000,000+events/day

•  Alotofqueries:usersandalgorithms

Page 24: ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS

UsedVertica,butneededtomove

•  Datasizesconstantlygrow•  EstimatedPBs•  Verticalicensewouldbetooexpensive

Page 25: ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS

… migrationwasnoteasy

*MoredetailsatOctober2017BerlinMeetup

Page 26: ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS

MajorDesignDecisions•  Dictionariesforstar-schemadesign

•  ExtensiveuseofArrays

•  SummingMergeTreeforrealtimeaggregation

•  Smartquerygeneration

•  Multipleshardsandreplicas

Page 27: ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS

Results•  Successfulmigration,1y+inproduction

•  Betterperformanceandflexibility

–  75Brows/day

–  1Мrows/secinpeakhours

–  1.3MSQLqueries/day

•  30%hardwarecostreduction(lessexpensivestorage):

•  Nolicensecostandlimits:

–  3PBofrawdata

–  6,000billionrows Poweredby:

Page 28: ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS

Case2.FintechCompany

•  StockSymbolsCorrelationAnalysis

•  5000Symbols

•  10yearsofdata

100Bdatapoints

Page 29: ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS
Page 30: ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS

Challenge•  (time,symbol,price)–100billion•  log_return=runningDifference(log(price))– 100billiontimes

•  corr(s1,s2)=corr(log_return(s1),log_return(s2))

Foreverypair(s1,s2)from5000s(i),12.5Mpairsoverall

•  Groupbyhours

Calculate12,500,000timesForeveryhour!

Page 31: ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS

Veryslow

Page 32: ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS

Tried…•  Hadoop

•  Spark

•  GreenplumClickHouse

Page 33: ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS

S1

S3

S2

timesymbolprice

timesymbollogReturn(price)

timegroupArray(symbol)groupArray(logRet..)

S1

S3

S2S2

R

R

R

S1

S3

2500tasks

date+hourcorr(S(i),S(j))

Page 34: ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS

POCPerformanceResults•  3serverssetup

•  2years,5000symbols:

–  log_returncalculations:~1h(distributed)–  Convertingtoarrays:~1h(almostdistributed)

–  Correlations:~50hours(alsodistributed)•  12,5M/50h=70/sec

Distributed=>itscaleseasily!

Page 35: ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS

Case3.Ivinco

•  Matureboardreadersystem

•  Alotofdatacollectedfromdifferentsources

•  Alotofoperationaldata(performancemonitoring)

200TBinMySQL!

Page 36: ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS

Operationalproblems•  Hardtoscale•  HardtomakeHAsolution

•  Performanceissues:

–  ‘Manual’partitioningandsharding

– Dozensofindexespertableetc.

Page 37: ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS

Organizationalproblems

•  Nodevelopmentresourcestorewrite

•  Minimalchangestocurrentsystemare

allowed

Page 38: ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS

BinarylogreplicationfromMySQLtoClickHouse

MySQL

clickhouse-mysql

Queries

SourceData

Page 39: ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS

Results

•  SeamlessintegrationofClickHouseintothecurrentsystem

•  Nodevelopers/codinginvolved,projectisdonewithDevOps

•  Easytotestperformancesidebyside(ClickHouseis100timesfaster)

•  Nowreadytore-writemainsystem

Moredetailsat:https://www.altinity.com/blog/2018/6/30/realtime-mysql-clickhouse-replication-in-practice

Page 40: ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS

ClickHouseToday•  MatureAnalyticDBMS.Provenbymanycompanies

•  2+yearsinOpenSource

•  Transparentdevelopmentroadmap

•  Manycommunitycontributors

•  Emergingeco-system(tools,drivers,integrations)

•  SupportandConsultingfromAltinity

Page 41: ClickHouse Summer Meetup8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS

Q&AContactme:

[email protected]@altinity.comskype:alex.zaitsevtelegram:@alexanderzaitsev

Altinity