Introducing Parallel Data Warehouse (The project formerly known as Madison)
description
Transcript of Introducing Parallel Data Warehouse (The project formerly known as Madison)
Thomas KejserSenior Program ManagerMicrosoft Corp.
Introducing Parallel Data Warehouse(The project formerly known as Madison)
2
AgendaThe Typical problem with data warehousesMPP vs SMPSQL Server Parallel Data Warehouse
Hardware architectureQuery ProcessingData Loading
My email: [email protected]
3
Introducing Parallel Data WarehouseThe Typical Problem with Data Warehouses
11
Microsoft DW Solutions
SSRS SSAS SSIS
Microsoft & PartnerServices
12
Symmetric Multi-Processing vs. Massively Parallel
Processing
HW advancements increasing ability to scale-up
But scaling limited by designHigh end SMP very expensive
Extremely high concurrency for simple workloadsLess than 1-2 TB of data SMP will almost always be better.
At higher sizes - depends
HW advancements increasing ability to scale-out
Scaling to 1 PB+Scale out is relatively low cost
Relatively high concurrency for complex workloads> 2TB up to 1 PB for DW workloads
Data Warehousing(esp. VLDB, complex workloads)
OLTP, Transactional,Data Warehousing
MPPSMP
13
PDW: No Assembly RequiredSoftwareServersStorage arraysNetwork switchesCablesLicensesPower distribution unitsRacksComes fully assembledSoftware is installed at the factoryFully configured
14
Basic Building BlocksCompute Nodes
Handles the CPU cycles required to answer queriesStorage Nodes
Stores data using Fiber Attached Disks. Scaled to support CPU with enough throughput
Other nodesMore about those later
15
Anatomy of a Compute Node
Pre-configured For Each SQL Server Instance On Each Compute Node.
Drives Configured As RAID1 To Avoid Appliance Failover for a Single Drive FailureIBM Compute Nodes Will Have 1 Lun (1 RAID1 Pair)Dell Compute Nodes Will Have 2 Lun’s (2 RAID1 Pairs)HP Compute Nodes Will Have 3 Luns’s (3 RAID1 Pairs)
TempDB: Sort-work Area For Data Loading Into Clustered Index TablesWork Area for PDW Temporary Work FilesSpill Area For Hash Joins Not Fitting Into Memory
16
Anatomy of a Storage Node
Pre-configured4 RAID10 Pairs for Primary User Data1 RAID10 Pair for Database Logs2 LUN’s Are Spread Across Each RAID Pair
User Databases are Separate Physical SQL Server DatabasesStaging Database (Optional) Used for Loading & to Minimize Fragmentation
17
More Node TypesBackup node:
Stores backup files from the applianceCan be logged into by authorized Windows usersCan be augmented with 3rd party H/W and S/W
Landing Zone:Used as a holding place for data to be loadedCan be logged into by authorized Windows usersCan be augmented with 3rd party H/W and S/W
Management node:Runs the Windows domain controller (Active Directory)Used for deploying patches to all nodes in the applianceHolds images in case a node needs reimaging
18
Putting It All Together - PDWControl Node
Failover Protection:• Redundant Control Node• Redundant Compute Node• Cluster Failover
•Redundante Array of Inexpensive Databases
Spare Node
19
Software Architecture
SQL Server
DW Authenticati
on
DW Configuratio
nDW
Schema TempDB
MPP EngineData Movement
Service
IIS
Compute NodesCompute Nodes
Compute Node
Query Tool
SQL Server
Data Movement Service
User Data
Admin Console
MS BI(AS, RS)
Control Node
Other 3rd
Party Tools
OLEDB, ODBC, ADO.Net, JDBC
DWSQLInternet Explorer
Landing Zone Node
Data Movement Service
20
Create DatabaseCREATE DATABASE database_name WITH ( AUTOGROW = ON , REPLICATED_SIZE = 1024 , DISTRIBUTED_SIZE = 16384 , LOG_SIZE = 300)
21
Date Dim
D_DATE_SK
D_DATE_ID
D_DATE
D_MONTH
…
Item
I_ITEM_SK
I_ITEM_ID
I_REC_START_
DATE
I_ITEM_DESC
…
Store Sales
Ss_sold_date_sk
Ss_item_sk
Ss_customer_sk
Ss_cdemo_sk
Ss_store_sk
Ss_promo_sk
Ss_quantity
…
Promotion
P_PROMO_SK
P_PROMO_ID
P_START_DATE
_SK
P_END_DATE_
SK
…
Store
S_STORE_SK
S_STORE_ID
S_REC_START_D
ATE
S_REC_END_DAT
E
S_STORE_NAME
…
Customer
C-
CUSTOMER_SK
C_CUSTOMER_I
D
C_CURRENT_AD
DR
…
Customer
Demographics
CD_DEMO_SK
CD_GENDER
CD_MARITAL_STATU
S
CD_EDUCATION
…
Database Distributed & Replicated Tables
Data Distribution with Replication
C I
D
CD
S
P
C I
D
CD
S
P
C I
D
CD
S
P
C I
D
CD
S
P
C I
D
CD
S
P
C I
D
CD
S
P
SS
SS
SS
SS
SS
SS
Distribution and Replication
22
Table CreationCREATE TABLE table_name [ ( { <column_definition> } [ ,...n ] ) [ AS SELECT select_criteria ] [ WITH ( <table_option> ) ] [;] <column_definition> ::= column_name <data_type> [ NULL | NOT NULL ] <data
type> ::= type_name [ ( precision [ , scale ] ) ] <table_option> ::= { [ CLUSTER_ON ( column_name [ ,...n ] ) ]
, [ DISTRIBUTE_ON ( column_name ) ] | [ REPLICATE ] , [ PARTITION_ON column_name ( RANGE { LEFT | RIGHT } FOR VALUES
{ [ boundary_value [,...n] ] ) ) ] }
Type Class Types SupportedIntegers tinyint, smallint, int, bigintFloating point float, realCharacter char, varchar, nchar, nvarcharDate & time date, time, datetime, dateime2, datetimeoffset,
timestamp, smalldatetime
Fixed point decimal, money, smallmoneyBinary binary, varbinary (8192)Other uniqueidentifier (?)
23
Create Table – Behind the ScenesCreate Table store_sales withdistribute_on (ss_item_sk) partition_on(ss_sold_date_sk)cluster_on (ss_sold_date_sk)
8K8K
8K8K
8K
8 Filegroups (one per core) - 1 Table per Filegroup
12 Partitions(ss_sold_date_sk)
N-number ofPages
Row
24
Physical File Layout (Per Compute Node)
25
MPP Query ProcessingControl Node
Query Rewritten Into Steps That Run Efficiently On Compute Nodes
ODBC/JDBCSQL92 with Analytical Extensions
Distribution-incompatible JoinsResolved Using High Speed Dynamic Re-distribution
Select location, yearsum(b.sales_amt)from customer a, sales bwhere b.sales > 500 anda.custid = b.custidgroup by 2,1order by 1,2
26
MPP Execution PlansThe MPP engine creates parallel execution plans from client SQLThe plans can include the following types of operations:
SQL operations: used to pass SQL directly to SQL Server on 1 or more nodes.DMS operations: used to move data among the nodes in an appliance for further processing.Temp tables operations: used to stage data for further processing.Return operations: push data back to the client.
Simple plans may include just one type of operation.Complex plans may include all of these operations.Plans are executed serially, one step at a time.
27
Date Dim
D_DATE_SK
D_DATE_ID
D_DATE
D_MONTH
…
Item
I_ITEM_SK
I_ITEM_ID
I_REC_START_
DATE
I_ITEM_DESC
…
Store Sales
Ss_sold_date_sk
Ss_item_sk
Ss_customer_sk
Ss_cdemo_sk
Ss_store_sk
Ss_promo_sk
Ss_quantity
…
Promotion
P_PROMO_SK
P_PROMO_ID
P_START_DATE
_SK
P_END_DATE_
SK
…
Store
S_STORE_SK
S_STORE_ID
S_REC_START_D
ATE
S_REC_END_DAT
E
S_STORE_NAME
…
Customer
C-
CUSTOMER_SK
C_CUSTOMER_I
D
C_CURRENT_AD
DR
…
Customer
Demographics
CD_DEMO_SK
CD_GENDER
CD_MARITAL_STATU
S
CD_EDUCATION
…
Data Distribution with Replication Sales table distributed
on customer... And partitioned by time
Example Schema
28
Distribution Compatible QuerySELECT CustomerId, SUM(Amount) AS TotalSales,
SUM(Quantity) AS TotalUnitsSold
FROM Sales s
JOIN Item i ON s.ItemId = i.ItemId
WHERE SaleDate BETWEEN '2009-08-01' AND '2009-08-31‘ AND Description LIKE '%gadgets%'
GROUP BY CustomerId
ORDER BY CustomerId;
29
MPP Query PlanStep 1 – On each compute node:SELECT s.[customerid], sum(s.[amount]) AS totalsales, sum(s.
[quantity]) AS totalunitssold
FROM [tpch_3].[dbo].[h_sales_34] s JOIN [tpch_3].[dbo].item_37 I ON (s.[itemid] = i.[itemid])
WHERE (s.[saledate] BETWEEN '2009-08-01' AND '2009-08-31' and i.[description] like '%gadgets%')
GROUP BY s.[customerid]
ORDER BY s.[customerid];
30
Query 1 Processing Flow
SQL Server
DW Authenticati
on
DW Configuratio
nDW
Schema TempDB
Data Movement
Service
Compute Node 1
Query Tool
SQL Server
Data Movement Service
User Data
Control Node
MPP Engine
Parse SQLValidate & AuthorizeBuild MPP PlanExecute PlanReturn Data to Client
Compute Node N
SQL Server
Data Movement Service
User Data
31
Reshuffling the dataSELECT SaleDate, SUM(Amount) AS TotalSales,
SUM(Quantity) AS TotalUnitsSold
FROM Sales s JOIN Item i ON s.ItemId = i.ItemId
WHERE SaleDate BETWEEN '2009-08-01' AND '2009-08-31' AND Description LIKE '%gadgets%‘
GROUP BY SaleDate
ORDER BY SaleDate;
32
MPP Query PlanStep 1 – Create temp table on control nodeCREATE TABLE [tempdb].[dbo].Q_[TEMP_ID_6760]
( saledate DATE, totalsales DECIMAL(38, 2), totalunitssold INTEGER )
WITH (DATA_COMPRESSION = PAGE);
Step 2 – Run on each compute nodeSELECT s.[saledate], sum(s.[amount]) AS totalsales, sum(s.
[quantity]) AS totalunitssold
FROM [tpch_3].[dbo].[h_sales_34] s JOIN [tpch_3].[dbo].item_37 i ON (s.[itemid] = i.[itemid])
WHERE (s.[saledate] BETWEEN '2009-08-01' AND '2009-08-31' and i.[description] like '%gadgets%’)
GROUP BY s.[saledate]
33
MPP Query Plan continuedStep 3:SELECT [saledate], sum([totalsales]) AS totalsales,
sum([totalunitssold]) AS totalunitssold
FROM [tempdb].[dbo].Q_[TEMP_ID_6760]
GROUP BY [saledate]
ORDER BY [saledate]
Step 4:DROP TABLE [tempdb].[dbo].Q_[TEMP_ID_6760];
34
Reshuffling – Query Processing Flow
SQL Server
DW Authenticati
on
DW Configuratio
nDW
Schema TempDB
Data Movement
Service
Compute Node
Query Tool
SQL Server
Data Movement Service
User Data
Control Node
MPP Engine
Parse SQLValidate & AuthorizeBuild MPP PlanExecute PlanReturn Data to Client Compute Node
SQL Server
Data Movement Service
User Data
35
Control Node
Spare Node
Landing Zone Node
Text FileText
FileText FileText
File
Data Loading
Tables Are Hash Distributed Or
Replicated
36
Load File
Bulk Insert
Partitioned Staging
Table(Heap)
Insert-Select
Partitioned FinalTable(CIDX)
Sort each BATCH
in memory
or TempDB
Sort each partition
In memory
or TempDB
Bulk Insert Phase
Trace Flags None
BATCHSIZE Calculated
TABLOCK ON
TempDB Entire BATCHSIZE for Sort
TempDB Log Minimal
StageDB Log Minimal
ROLLBACK
Commits per BATCHSIZERollback to last BATCH Only
Trace Flags 610 per NUMA Session
MAXDOP 1 Per NUMA SessionTABLOCK OFF
TempDB Entire PARTITION for sort
TempDB Log Minimal
UserDB Log Twice Data File Size
ROLLBACK
Commits Full TRANSACTIONRollback Full TRANSACTION
Insert-Select Phase
Data Loader Process
37
© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after
the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.