Post on 24-May-2020
Attackers
Visitors
Crawlers& bots
Your website
Attackers
Visitors
Crawlers& bots
Your website
Cloudflare Protected
100+Data centers globally
2.5BMonthly unique visitors
>10%Internet requests
everyday
<=3MDNS queries/second
websites, apps & APIs in 150 countries
6M+ 5M+HTTP requests/second
Anatomy of a DNS query$ dig www.cloudflare.com
; <<>> DiG 9.8.3-P1 <<>> www.cloudflare.com;; global options: +cmd;; Got answer:;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 36582;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:;www.cloudflare.com. IN A
;; ANSWER SECTION:www.cloudflare.com. 5 IN A 198.41.215.162www.cloudflare.com. 5 IN A 198.41.214.162
;; Query time: 34 msec;; SERVER: 192.168.1.1#53(192.168.1.1);; WHEN: Sat Sep 2 10:48:30 2017;; MSG SIZE rcvd: 68
Anatomy of a DNS query$ dig www.cloudflare.com
; <<>> DiG 9.8.3-P1 <<>> www.cloudflare.com;; global options: +cmd;; Got answer:;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 36582;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:;www.cloudflare.com. IN A
;; ANSWER SECTION:www.cloudflare.com. 5 IN A 198.41.215.162www.cloudflare.com. 5 IN A 198.41.214.162
;; Query time: 34 msec;; SERVER: 192.168.1.1#53(192.168.1.1);; WHEN: Sat Sep 2 10:48:30 2017;; MSG SIZE rcvd: 68
Fields30+
What did we want?
- Multidimensional query analytics
- Complex ad-hoc queries
- Capable of current and expected future scale
- Gracefully handle late arriving log data
- Roll-ups/aggregations for long term storage
- Highly available and replicated architecture
QueriesPer Second
<=3M
Edge Points of Presence
100+
Query Dimensions
20+
Years of stored aggregation
5+
We tried a few things...
- Kafka + Citusdb + Go
- Kafka + Spark Streaming
- Kafka + Flink
- Kafka + Druid
- Kafka + ClickHouse
ClickHouse
- Tabular, column-oriented data store
- Clustered architecture
- Familiar SQL query interfaceLots of very useful built-in aggregation functions
- Raw log data stored for 3 months~7 trillion rows
- Aggregated data for ∞1m, 5m, 1h aggregations across 3 dimensions
What did we want?
- Multidimensional query analytics
- Complex ad-hoc queries
- Capable of current and expected future scale
- Gracefully handle late arriving log data
- Roll-ups/aggregations for long term storage
- Highly available and replicated architecture
QueriesPer Second
<=3M
Edge Points of Presence
100+
Query Dimensions
20+
Years of stored agg.
5+
Attackers
Visitors
Crawlers& bots
Your website
Cloudflare DNS Server
DNS Query
Log Forwarder
Kafka Topic
Go ClickHouseInserter
ClickHouseCluster
October 2016Began evaluating technologies and architecture
Finalized schema, deployed a production ClickHouse cluster of 6 nodes
November 2016Prototype ClickHouse cluster with 3 nodes, inserting a sample of data
August 2017Migrated to a new cluster with multi-tenancy
Growing interest among other Cloudflare engineering teams, worked on standard tooling
Multi-tenant ClickHouse cluster
Row Insertion/s
8M+Raid-0 Spinning Disks
1PB+Insertion Throughput/s
4GB+Nodes
33
October 2016Began evaluating technologies and architecture
Finalized schema, deployed a production ClickHouse cluster of 6 nodes
November 2016Prototype ClickHouse cluster with 3 nodes, inserting a sample of data
August 2017Migrated to a new cluster with multi-tenancy
Growing interest among other Cloudflare engineering teams, worked on standard tooling
ExampleSELECT toStartOfMinute(datetime) as t, count(*) / 60 AS qpsFROM open.dnslogsWHERE date = '2017-08-01' AND toHour(datetime) = 21 AND ...GROUP BY tORDER BY t
ExampleSELECT toStartOfMinute(datetime) as t, count(*) / 60 AS qps, uniq(srcIPv4) AS ip4, uniq(srcIPv6) AS ip6, uniq(queryName) AS qn, countIf(queryType = 1) AS aCount, countIf(queryType = 28) AS aaaaCountFROM open.dnslogsWHERE date = '2017-08-01' AND ...GROUP BY tORDER BY t