Enable business continuity and high availability through active active technology
DATA WAREHOUSE - QCon San Francisco · for high availability and durability Security is at the core...
Transcript of DATA WAREHOUSE - QCon San Francisco · for high availability and durability Security is at the core...
![Page 1: DATA WAREHOUSE - QCon San Francisco · for high availability and durability Security is at the core Availability All tier distributed over multiple datacenters with active-active](https://reader034.fdocuments.net/reader034/viewer/2022042306/5ed18e7d5053201b4d5aaa69/html5/thumbnails/1.jpg)
DATA WAREHOUSE BUILT FOR THE CLOUDQCON San Francisco, November 2019
Thierry Cruanes, Co-Founder & CTO
![Page 2: DATA WAREHOUSE - QCon San Francisco · for high availability and durability Security is at the core Availability All tier distributed over multiple datacenters with active-active](https://reader034.fdocuments.net/reader034/viewer/2022042306/5ed18e7d5053201b4d5aaa69/html5/thumbnails/2.jpg)
THE DREAM DATA WAREHOUSE(CIRCA 2012)
No management tasks, offered as a service
Fast out-of-box with no tuning knobs
Structured and semi-structured
Petabyte scale at very low cost
Full support for ACID transactions with read consistency
ANSI SQL, RBAC
No data silos
10x faster for the same price, no over provisioning
Extreme simplicity
Store all your data
No compromises full fledge Data
Warehouse
Unlimited and Instant Scaling
![Page 3: DATA WAREHOUSE - QCon San Francisco · for high availability and durability Security is at the core Availability All tier distributed over multiple datacenters with active-active](https://reader034.fdocuments.net/reader034/viewer/2022042306/5ed18e7d5053201b4d5aaa69/html5/thumbnails/3.jpg)
WHY THEN?OUR VIEW OF THE CLOUD…
20x
§ Storage became dirt cheap
§ Flat network offered uniform bandwidth
§ Single core performance stalled
§ Data warehouse and analytic workload are mostly CPU bound
Design for abundance
and not scarcity of resources
![Page 4: DATA WAREHOUSE - QCon San Francisco · for high availability and durability Security is at the core Availability All tier distributed over multiple datacenters with active-active](https://reader034.fdocuments.net/reader034/viewer/2022042306/5ed18e7d5053201b4d5aaa69/html5/thumbnails/4.jpg)
THREE PILLARS
20x
Multi-Tenant Service
Multi-cluster shared data Architecture
Immutable Scalable Storage
Leverage cloud elasticity and pay only what you use
Instant scale
Performance isolation
Real-time Data sharing
Extremely fast response time at scale
Fine grain vertical and horizontal pruning on any column
Automatically applied to any data (structured and semi-structured)
Self-tuning, self-healing
Transparent upgrade
Service architecture designed for availability, durability and security
![Page 5: DATA WAREHOUSE - QCon San Francisco · for high availability and durability Security is at the core Availability All tier distributed over multiple datacenters with active-active](https://reader034.fdocuments.net/reader034/viewer/2022042306/5ed18e7d5053201b4d5aaa69/html5/thumbnails/5.jpg)
ARCHITECTURE
![Page 6: DATA WAREHOUSE - QCon San Francisco · for high availability and durability Security is at the core Availability All tier distributed over multiple datacenters with active-active](https://reader034.fdocuments.net/reader034/viewer/2022042306/5ed18e7d5053201b4d5aaa69/html5/thumbnails/6.jpg)
AN ARCHITECTURE BUILT FOR THE CLOUD
Traditional Architectures
Shared storageSingle cluster
Shared-disk
Decentralized, local storageSingle cluster
Shared-nothing Multi-cluster, shared data
Centralized, scale-out storageMultiple, independent compute clusters
![Page 7: DATA WAREHOUSE - QCon San Francisco · for high availability and durability Security is at the core Availability All tier distributed over multiple datacenters with active-active](https://reader034.fdocuments.net/reader034/viewer/2022042306/5ed18e7d5053201b4d5aaa69/html5/thumbnails/7.jpg)
MULTI-CLUSTER, SHARED DATA ARCHITECTURE
No data silosStorage decoupled from compute
Any dataNative for structured & semi-structured
Unlimited scalabilityAlong many dimensions
Low costCompute on demand
Instantly cloningIsolate prod from dev & qa
Highly available11 9’s durability, 4 9’s availability
Databases
Clone
Data Science
VirtualWarehouse
ETL & Data Loading
VirtualWarehouse
Finance
VirtualWarehouse
Dev, Test, QA
VirtualWarehouse
Dashboards
VirtualWarehouse
Marketing
VirtualWarehouse
VirtualWarehouse
![Page 8: DATA WAREHOUSE - QCon San Francisco · for high availability and durability Security is at the core Availability All tier distributed over multiple datacenters with active-active](https://reader034.fdocuments.net/reader034/viewer/2022042306/5ed18e7d5053201b4d5aaa69/html5/thumbnails/8.jpg)
VIRTUAL WAREHOUSE How to allow concurrent workloads run without impacting each other?
One or more MPP compute cluster
Unit of fault and performance isolation
Use multiple warehouses to segregate workload
Resizable on the fly
Able to access data in any database
Transparently caches data accessed
Transaction manager synchronizes data access
Automatic suspend when idle and resume when needed
SSD/RAM Cache SSD/RAM Cache SSD/RAM Cache SSD/RAM Cache
Virtualwarehouse A
Virtualwarehouse B
Virtualwarehouse C
Virtualwarehouse D
ETL Transformation SQL BI
![Page 9: DATA WAREHOUSE - QCon San Francisco · for high availability and durability Security is at the core Availability All tier distributed over multiple datacenters with active-active](https://reader034.fdocuments.net/reader034/viewer/2022042306/5ed18e7d5053201b4d5aaa69/html5/thumbnails/9.jpg)
MULTI-CLUSTER WAREHOUSELEVERAGE ABUNDANCE OF COMPUTE RESOURCES
Automatically scales compute resources based on concurrent usage
Single virtual warehouse of multiple compute clusters
Queries are load balanced across the clusters in a virtual warehouse
Split across availability zones for high availability
Cluster 1 Cluster 2 Cluster 3
Virtual Warehouse Group
Query
Query
Query
Query scheduler
![Page 10: DATA WAREHOUSE - QCon San Francisco · for high availability and durability Security is at the core Availability All tier distributed over multiple datacenters with active-active](https://reader034.fdocuments.net/reader034/viewer/2022042306/5ed18e7d5053201b4d5aaa69/html5/thumbnails/10.jpg)
IN THE REAL-WORLD
ContinuousLoading (4TB/day) S3
<5min SLA
Virtual WarehouseMedium
ETL &Maintenance
Virtual WarehouseLarge
Virtual Warehouse2X-Large
Reporting(Segmented)
InteractiveDashboard
50% < 1s85% < 2s95% < 5s
Virtual WarehouseAuto Scale – X-Large x 5
4 trillion rows3+ petabyte raw
8x compression ratio 25M+ micro-partitions
Prod DB
![Page 11: DATA WAREHOUSE - QCon San Francisco · for high availability and durability Security is at the core Availability All tier distributed over multiple datacenters with active-active](https://reader034.fdocuments.net/reader034/viewer/2022042306/5ed18e7d5053201b4d5aaa69/html5/thumbnails/11.jpg)
SCALABLE IMMUTABLE STORAGE
![Page 12: DATA WAREHOUSE - QCon San Francisco · for high availability and durability Security is at the core Availability All tier distributed over multiple datacenters with active-active](https://reader034.fdocuments.net/reader034/viewer/2022042306/5ed18e7d5053201b4d5aaa69/html5/thumbnails/12.jpg)
STORAGE IMMUTABILITY
Accumulates immutable data over timeWell supported by all cloud vendor object stores
Allow separation of storage and compute resourcesEnable workload scalability
Heavily optimized for read mostly workloadNatural fit for analytic systems
Transaction management becomes a metadata problemMulti-version concurrency control and Snapshot isolation semantic
Transaction coordination separated from storage and compute Allow for consistent access across compute resources
![Page 13: DATA WAREHOUSE - QCon San Francisco · for high availability and durability Security is at the core Availability All tier distributed over multiple datacenters with active-active](https://reader034.fdocuments.net/reader034/viewer/2022042306/5ed18e7d5053201b4d5aaa69/html5/thumbnails/13.jpg)
SCALABLE STORAGE AUTOMATIC MICRO-PARTITIONING
Columnar
Partitions
Data is automatically partitioned at load timeStorage decoupled from compute
Columnar organization in each micro-partitionEnable both horizontal vertical pruning
Micro partition – only few 10MBsFine grain pruning, no skew
Metadata structure tracks data distributionVery fast pruning at optimization time
Applied to both structured and semi-structured dataVery fast response time for both
![Page 14: DATA WAREHOUSE - QCon San Francisco · for high availability and durability Security is at the core Availability All tier distributed over multiple datacenters with active-active](https://reader034.fdocuments.net/reader034/viewer/2022042306/5ed18e7d5053201b4d5aaa69/html5/thumbnails/14.jpg)
AUTOMATICALLY APPLIED TO SEMI-STRUCTURED DATA
> SELECT … FROM …
Semi-structured data(JSON, Avro, XML, Parquet, ORC)
Structured data (e.g., CSV, TSV, …)
Optimized storageOptimized data type, no fixed schema or
transformation required
Optimized SQL querying
Full benefit of database optimizations (pruning, filtering, …)
Native supportLoaded in raw form (e.g. JSON, Avro, XML)
![Page 15: DATA WAREHOUSE - QCon San Francisco · for high availability and durability Security is at the core Availability All tier distributed over multiple datacenters with active-active](https://reader034.fdocuments.net/reader034/viewer/2022042306/5ed18e7d5053201b4d5aaa69/html5/thumbnails/15.jpg)
EXAMPLE
Compute
Client Application
Web UIODBC Driver JDBC Driver
StorageS3
Sales
MarketingData
CloudServices
SecurityOptimization WarehouseMgmt
Query Mgmt
MetadataMetadataMetadata
DDL
Custom Reports
Node Node Node Node
Node Node Node Node
Node Node Node Node
Node Node Node Node
Custom Reports
XL
Node Node Node Node
Node Node Node Node
Campaign Analysis
Campaign Analysts
L
Storage 19H
2AI
3BJ
4CK
5DL
6EM
7FN
8GO
1 3 6 8
2 B D F
1B6F
38JG
K7O4
H2CL
PT
QU
RV
SW
Node Node Node Node
Node Node Node Node
Loading WH
L
Loading WH
P Q R S
T U V W
P
Q
R
ST
U
VW
HTTPS (JDBC/ODBC/Python)
![Page 16: DATA WAREHOUSE - QCon San Francisco · for high availability and durability Security is at the core Availability All tier distributed over multiple datacenters with active-active](https://reader034.fdocuments.net/reader034/viewer/2022042306/5ed18e7d5053201b4d5aaa69/html5/thumbnails/16.jpg)
ENABLE DATA SHARING
Data Consumers
Secure and integrated Snowflake’s access control model
Only pay normal storage costs for shared data
No limit to the number of consumer accounts with which a dataset may be shared
ProvidersGet access to the data without any need to move or transform it.
Query and combine shared data with existing data or join together data from multiple publishers
Consumers
Data Providers
![Page 17: DATA WAREHOUSE - QCon San Francisco · for high availability and durability Security is at the core Availability All tier distributed over multiple datacenters with active-active](https://reader034.fdocuments.net/reader034/viewer/2022042306/5ed18e7d5053201b4d5aaa69/html5/thumbnails/17.jpg)
ENABLE GLOBAL REPLICATION
AWS(US West)
Azure(US East)
AWS(Ireland)
AWS(Sydney)
AWS
Azure
AWS(US East)
Azure(Frankfurt)
AWS (Frankfurt)
![Page 18: DATA WAREHOUSE - QCon San Francisco · for high availability and durability Security is at the core Availability All tier distributed over multiple datacenters with active-active](https://reader034.fdocuments.net/reader034/viewer/2022042306/5ed18e7d5053201b4d5aaa69/html5/thumbnails/18.jpg)
MULTI-TENANT SERVICE
![Page 19: DATA WAREHOUSE - QCon San Francisco · for high availability and durability Security is at the core Availability All tier distributed over multiple datacenters with active-active](https://reader034.fdocuments.net/reader034/viewer/2022042306/5ed18e7d5053201b4d5aaa69/html5/thumbnails/19.jpg)
DATA WAREHOUSE AS A SERVICE
DurabilityMulti-Tenant Service
No administration, self-tuning and healing,
Transparent upgrade
Service architecture designed for high availability and durability
Security is at the core
Availability
All tier distributed over multiple datacenters with active-active data replication
No maintenance downtime, fully transparent software & hardware upgrade
Automatic repair of any failed servers with transparent re-execution of any failed queries
Persistent session for load-balancing and transparent fail-over
Synchronous replication of data over multiple data centers
Automatic data retention and fail safe technology to guard against any data removal
![Page 20: DATA WAREHOUSE - QCon San Francisco · for high availability and durability Security is at the core Availability All tier distributed over multiple datacenters with active-active](https://reader034.fdocuments.net/reader034/viewer/2022042306/5ed18e7d5053201b4d5aaa69/html5/thumbnails/20.jpg)
SNOWFLAKE SERVICEThree independent layers
Cache Cache Cache Cache
Authentication & Access Control
Infrastructure manager Optimizer Transaction
manager Security
Metadata
Cloud services Compilation and Management
Data processing Virtual warehouses
StorageDatabases
![Page 21: DATA WAREHOUSE - QCon San Francisco · for high availability and durability Security is at the core Availability All tier distributed over multiple datacenters with active-active](https://reader034.fdocuments.net/reader034/viewer/2022042306/5ed18e7d5053201b4d5aaa69/html5/thumbnails/21.jpg)
MANAGED SERVICEBUILT-IN DISASTER RECOVERY AND HIGH AVAILABILITY
Scale-out of all tiersmetadata, compute, storage
Resiliency across multiple availability zonesgeographic separationseparate power gridsbuilt for synchronous replication
Fully online updates & patcheszero downtime
Back pressure and throttling all the way back to the client
Cloudservices
Virtualwarehouses
Databasestorage
Services
Metadata
![Page 22: DATA WAREHOUSE - QCon San Francisco · for high availability and durability Security is at the core Availability All tier distributed over multiple datacenters with active-active](https://reader034.fdocuments.net/reader034/viewer/2022042306/5ed18e7d5053201b4d5aaa69/html5/thumbnails/22.jpg)
ADAPTIVE ALL THE WAY TO THE CORESELF TUNING & SELF HEALING INTERNALS
Adaptive
Self-tuning
Do no harm!
Automatic
Default
Automatic Memory
Management
Automatic Workload
Management
Automatic Distribution
Method
Automatic Degree of
Parallelism
AutomaticFault
Handling No StatisticsNo Vacuuming
![Page 23: DATA WAREHOUSE - QCon San Francisco · for high availability and durability Security is at the core Availability All tier distributed over multiple datacenters with active-active](https://reader034.fdocuments.net/reader034/viewer/2022042306/5ed18e7d5053201b4d5aaa69/html5/thumbnails/23.jpg)
EXAMPLE: AUTOMATIC SKEW AVOIDANCEDetect popular values on the build side of the join
popular values detected at runtime
number of values
no performance degradation
kicks in when needed
enabled by default for all joins
Adaptive
Self-tuning
Do no harm!
Automatic
Default
Execution Plan
scan
join
scan
filter
12 Use broadcast for those and directed join for the others
21
![Page 24: DATA WAREHOUSE - QCon San Francisco · for high availability and durability Security is at the core Availability All tier distributed over multiple datacenters with active-active](https://reader034.fdocuments.net/reader034/viewer/2022042306/5ed18e7d5053201b4d5aaa69/html5/thumbnails/24.jpg)
WHAT’S NEXT?SERVERLESS DATA SERVICES
Target predictable well-identified database workloads
Horizontal scaling is automatic
Fine grain unit of work allow for degree of parallelism to be arbitrarily small or large
Secure since handled by the service
Transparent retry on failures
Service state entirely managed by the service
Monitoring and observability of the service
![Page 25: DATA WAREHOUSE - QCon San Francisco · for high availability and durability Security is at the core Availability All tier distributed over multiple datacenters with active-active](https://reader034.fdocuments.net/reader034/viewer/2022042306/5ed18e7d5053201b4d5aaa69/html5/thumbnails/25.jpg)
CLOUD NATIVE ARCHITECTURE
A GIFT THAT KEEPS ON GIVING