DITA Everyday by Tom Rathkamp [email protected] [email protected].
Parallel Execution Plans Joe Chang [email protected] .
Transcript of Parallel Execution Plans Joe Chang [email protected] .
![Page 2: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/2.jpg)
Parallel Execution PlansParallel Execution Plans
Allows single query to use multiple processors
Query should run faster but may consume more resources
Example
1 thread: 10 sec run time, 10 CPU-sec
2 threads: 6 sec run time, 12 CPU-sec
![Page 3: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/3.jpg)
Parallel Execution ConfigurationParallel Execution Configuration
Cost Threshold For ParallelismMinimum query plan threshold for considering queries for parallel execution
Default 5: Considering increasing to 20-50 for new systems
Max Degree of ParallelismDefault 0: Can use all available processors
SQL Server determines level based on available memory and recent CPU usage
![Page 4: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/4.jpg)
Parallel Plan OperatorsParallel Plan Operators
The Distribute Streams operator consumes a single input stream of records and produces multiple output streams. The record contents and format are not changed. Each record from the input stream appears in one of the output streams. This operator automatically preserves the relative order of the input records in the output streams. Usually, hashing is used to decide to which output stream a particular input record belongs.
The Repartition Streams operator consumes multiple streams and produces multiple streams of records. The record contents and format are not changed. Each record from an input stream is placed into one output stream. If this operator is order-preserving, then all input streams must be ordered and merged into several ordered output streams.
The Gather Streams operator consumes several input streams and produces a single output stream of records by combining the input streams. The record contents and format are not changed. If this operator is order-preserving, then all input streams must be ordered.
![Page 5: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/5.jpg)
Execution Plan Cost FormulasExecution Plan Cost Formulas
Table Scan or Index Scan
I/O: 0.0375785 + 0.0007407 per pageCPU: 0.0000785 + 0.0000011 per row
Index Seek – Plan Formula
I/O Cost = 0.006328500 + 0.000740741 per additional page (≤1GB)
= 0.003203425 + 0.000740741 per additional page (>1GB)
CPU Cost = 0.000079600 + 0.000001100 per additional row
Bookmark Lookup – May have changed ?
I/O Cost = multiple of 0.006250000 (≤1GB)
= multiple of 0.003124925 (>1GB)
CPU Cost = 0.0000011 per row
Table Scan or Index Scan
IUD I/O Cost ~ 0.01002 – 0.01010 (>100 rows)
IUD CPU Cost = 0.000001 per row
![Page 6: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/6.jpg)
Cost InterpretationCost Interpretation
Time in seconds? CPU time?0.0062500sec -> 160/sec
0.000740741 ->1350/sec (8KB)->169/sec(64K)-> 10.8MB/sec
S2K BOL: Administering SQL Server, Managing Servers,Setting Configuration Options: cost threshold for parallelism OptQuery cost refers to the estimated elapsed time, in seconds, required to execute a query on a specific hardware configuration.
Too fast for 7200RPM disk random I/Os.
About right for 1997 sequential disk transfer rate?
![Page 7: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/7.jpg)
Test TableTest Table
CREATE TABLE M3A_20 (GroupID int NOT NULL,ID int NOT NULL,ID2 int NOT NULL,ID3 int NOT NULL,ID4 int NOT NULL,sID smallint NOT NULL,bID1 bigint NOT NULL,bID2 bigint NOT NULL,bID3 bigint NOT NULL,rMoney money NOT NULL,rDate datetime NOT NULL,rReal real NOT NULL,rDecimal decimal (9,4) NOT NULL,CONSTRAINT [PK_M3A_20] PRIMARY KEY CLUSTERED ( [GroupID], [ID] ) WITH FILLFACTOR = 100 )
GO
![Page 8: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/8.jpg)
Data Population Script 1Data Population Script 1SET NOCOUNT ON DECLARE @BatchTotal int, @BatchSize int, @TotalRows int, @BatchStart int, @BatchEnd int, @BatchRow int, @I int, @RowsPerPage bigint , @Card int , @DistinctValues intSELECT @BatchStart=1, @BatchEnd=1000, @BatchTotal=1000, @BatchSize=100000, @RowsPerPage=100, @Card=100000SELECT @TotalRows=@BatchTotal*@BatchSize SELECT @I=(@BatchStart-1)*@BatchSize+1, @DistinctValues=@TotalRows/@CardWHILE @BatchStart <= @BatchEnd BEGIN BEGIN TRANSACTION SELECT @BatchRow = @BatchStart*@BatchSize WHILE @I <= @BatchRow BEGIN INSERT M3A_20 (GroupID, ID, ID2, ID3, ID4, sID, bID1, bID2, bID3, rMoney, rDate, rReal, rDecimal) VALUES ( 1, @I, @TotalRows-@I+1, (@I-1)/@Card+1, (@TotalRows-@I)%@Card+1, @I%32768, @I, (@I-1)%@Card+1, 1+(@I-1)*@RowsPerPage/@TotalRows+((@I-1)*@RowsPerPage)%@TotalRows, 10000*rand(), DATEADD(hour,@I%3000000,'1900-01-01'), 10000*rand(), 10000*rand() ) IF @@ERROR > 0 BEGIN GOTO B END SET @I = @I+1 END COMMIT TRANSACTION CHECKPOINTPRINT CONVERT(varchar,GETDATE(),121) + ', row ' + CONVERT(varchar,@BatchRow) SET @BatchStart = @BatchStart+1END B: IF @@TRANCOUNT > 0 COMMIT TRANSACTION PRINT '01 Complete ' + CONVERT(varchar,GETDATE(),121) + ', row ' + CONVERT(varchar,@BatchRow) + ', Trancount ' + CONVERT(varchar(10),@@TRANCOUNT)
![Page 9: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/9.jpg)
Data Population Script 1 NotesData Population Script 1 Notes
Double While LoopEach Insert/Update/Delete statement is an implicit transaction
Gets separate transaction log entry
Explicit transaction – generates a single transaction log write (max 64KB per IO)
Single TRAN for entire loop requires excessively large log file
Inserts are grouped into intermediate size batches
![Page 10: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/10.jpg)
Data Population Scripts 2Data Population Scripts 2
DECLARE @L int SELECT @L = 1WHILE @L <= 3 BEGIN INSERT M3A_11 (GroupID,ID,ID2,ID3,ID4,sID,bID1,bID2,bID3,rMoney,rDate,rReal, rDecimal) SELECT TOP 500000 GroupID, ID, 1500001-ID, ID3, ID4, sID, bID1, bID2, bID3, rMoney, rDate, rReal, rDecimal FROM M3A_20 WHERE GroupID = 1 AND ID BETWEEN (@L-1)*500000+1 AND @L*500000 SELECT @L = @L + 1 CHECKPOINT PRINT '11 Step ' + CONVERT(varchar,@L) + ', ' + CONVERT(varchar,GETDATE(),121)END
UPDATE STATISTICS M3A_01 (PK_M3A_01) WITH FULLSCAN
CREATE STATISTICS ST_01 ON M3A_01 (ID) WITH FULLSCAN, NORECOMPUTE
Primary table populated using single row inserts in a WHILE loop,Additional tables populated with INSERT / SELECT statement
Single row inserts ~20-30K rows/secINSERT / SELECT statement ~100K+ rows/sec
![Page 11: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/11.jpg)
Index Seek PlansIndex Seek Plans
Many rows returned,Non-parallel plan
Parallel Execution disabled
Cost: 9.34
Cost: 9.82
Cost: 4.94Parallel Plan
![Page 12: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/12.jpg)
Index Seek DetailsIndex Seek Details
Non-parallel plan
Parallel plan
![Page 13: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/13.jpg)
Index Seek – Non-parallelIndex Seek – Non-parallel
Cost assigned to SELECT
Index Seek, 1M rows in 11,115 pages (81 bytes/row, 90% Fill)I/O cost is: 8.2365CPU Cost is 1.1000785Cost & sub-tree Cost is correct, I/O & CPU is ½ of correct value
![Page 14: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/14.jpg)
Index Seek – Parallel PlanIndex Seek – Parallel Plan
No cost assigned to SELECT
Index Seek I/O and CPU cost ½ of non-parallel plan
![Page 15: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/15.jpg)
Index Seek with AggregateIndex Seek with Aggregate
1234
![Page 16: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/16.jpg)
Index Seek Aggregate Parallel Index Seek Aggregate Parallel Plan DetailsPlan Details
1
2
3
4
![Page 17: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/17.jpg)
Table ScanTable Scan
Cost: 9.01
Cost: 8.26
![Page 18: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/18.jpg)
Table Scan Details Table Scan Details
Non-parallel plan
Parallel plan
I/O cost sameCPU cost ½ of non parallel plan
![Page 19: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/19.jpg)
Table Scan DetailsTable Scan Details
Non-parallel plan
Parallel plan
No cost on Select
No cost
I/O cost sameCPU cost ½ of non parallel plan
![Page 20: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/20.jpg)
Parallel Plan Cost Formulas PatternsParallel Plan Cost Formulas Patterns
CPU costs are ½ of non-parallel plan
Index Seek I/O cost are also ½
Scan I/O cost is same as non-parallel plan
Parallel plan costs are based on 2 processors
Actual number of processors determined at runtime
Overhead operationsDistribute, Repartition & Gather Streams
![Page 21: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/21.jpg)
Hash Join Hash Join
Cost: 6.50
Cost: 4.79
200,000 rows15 byte OS row size
![Page 22: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/22.jpg)
Hash Join DetailsHash Join Details
Non-parallel plan
Parallel plan
![Page 23: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/23.jpg)
Hash Join DetailsHash Join Details
Non-parallel plan
Parallel plan
![Page 24: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/24.jpg)
Hash Join – Non-parallel planHash Join – Non-parallel plan
![Page 25: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/25.jpg)
Hash Join – Parallel PlanHash Join – Parallel Plan1234
1
2
3
4
![Page 26: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/26.jpg)
Hash Join with I/O CostHash Join with I/O Cost
900,000 rowsMAXDOP 1
Cost 74.1
Cost 85.1
![Page 27: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/27.jpg)
Hash Join – Join I/O CostHash Join – Join I/O Cost
730,000 rows
740,000 rows
![Page 28: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/28.jpg)
Hash Join - BitmapHash Join - Bitmap
![Page 29: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/29.jpg)
Hash Join Cost FormulaHash Join Cost Formula
Index Seek – Plan Formula
I/O Cost = 0.006328500 + 0.000740741 per additional page (≤1GB)
= 0.003203425 + 0.000740741 per additional page (>1GB)
CPU Cost = 0.000079600 + 0.000001100 per additional row
Hash Join
CPU Cost = 0.017750000 base + 0.0000001749 (2-30 rows)
+ 0.0000000720 (100 rows)
0.000015091 per row
0.000015857 (parallel)
+ 0.000001880 per row per 4 bytes in OS
+ 0.000005320 per additional row in IS
I/O Cost = 0.000042100 per row over 64MB (Row Size+8)
0.0000036609 per 4 byte over 15B
![Page 30: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/30.jpg)
Parallel Cost FormulaParallel Cost Formula
Base Cost 0.028500
Repartition StreamCost per row
= 0.0000024705 Base (15 Bytes) + 0.000000759 per additional 4 Bytes
Gather StreamCost per row
= 0.0000018735 Base(15) + 0.000000759 per additional 4 Bytes
Dispatch
![Page 31: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/31.jpg)
Loop JoinLoop Join
![Page 32: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/32.jpg)
Loop Join DetailsLoop Join Details
Non-parallel planOuter Source
Parallel planOuter Source
![Page 33: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/33.jpg)
Loop Join DetailsLoop Join Details
Inner Source cost identical for both non-parallel and parallel plans
![Page 34: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/34.jpg)
Loop Join DetailsLoop Join Details
Non-parallel plan
Parallel plan
![Page 35: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/35.jpg)
Merge JoinMerge Join
![Page 36: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/36.jpg)
Merge Join DetailsMerge Join Details
Non-parallel plan
Parallel plan
![Page 37: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/37.jpg)
Merge Join DetailsMerge Join Details
Non-parallel plan
Parallel plan
![Page 38: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/38.jpg)
Merge Join DetailsMerge Join Details
Non-parallel plan
Parallel plan
![Page 39: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/39.jpg)
Index Seek + Aggregate TestIndex Seek + Aggregate Test
0
0.2
0.4
0.6
0.8
1
1.2
1 Sum 1 NULL 2 Sum 2 NULL 3 Sum 3 NULL
Du
rati
on
/1K
ro
ws
(ms)
1P 2P
00.10.20.30.40.50.60.7
1 Sum 2 Sum 3 Sum
Du
rati
on
/1K
ro
ws
(ms)
1P
2P
Opteron2.2GHz 1M
Xeon 2.4GHz/512K
![Page 40: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/40.jpg)
Index Seek + Aggregate Test, Itanium 2Index Seek + Aggregate Test, Itanium 2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 of 10M Count Sum Convert Max Money Decimal
Du
rati
on
ms/
1K r
ow
s 1P 2P 4P 8P
Itanium 2 1.5GHz/6M
![Page 41: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/41.jpg)
Index Seek + Aggregate Test, SUM(INT)Index Seek + Aggregate Test, SUM(INT)
Itanium 2 1.5GHz/6M
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Count 1 Sum 2 Sum 3 Sum
1P 2P
4P 8P
![Page 42: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/42.jpg)
Index Seek + Aggregate Test, NULLIndex Seek + Aggregate Test, NULL
Itanium 2 1.5GHz/6M
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1 Sum 1 NULL 2 Sum 2 NULL 3 Sum
1P
2P
4P
8P
![Page 43: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/43.jpg)
Loop Join, COUNT(*)Loop Join, COUNT(*)
Itanium 2 1.5GHz/6M
0
1
2
3
4
5
6
7
100 1,000 10,000
rows (000's)
Du
rati
on
/1K
ro
ws
(ms)
1P 2P 4P 8P
![Page 44: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/44.jpg)
Hash Join, COUNT(*)Hash Join, COUNT(*)
Itanium 2 1.5GHz/6M
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
100 1,000 10,000rows (000's)
Du
rati
on
/1K
ro
ws
(ms)
1P
2P
4P
8P
![Page 45: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/45.jpg)
Merge Join, COUNT(*)Merge Join, COUNT(*)
Itanium 2 1.5GHz/6M
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
100 1,000 10,000rows (000's)
Du
rati
on
/1K
ro
ws
(ms)
1P 2P 4P
![Page 46: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/46.jpg)
General RecommendationsGeneral Recommendations
Useful in DW, ETL, and maintenance activities
Use judgment on transactions processing
Is throughput more important
Or faster expensive queries
Increase Cost Threshold from 5 to 20-50
Limit MAXDOP to 4
Verify or limit parallelism on Xeon systems with Hyper-Threading enabled
![Page 47: Parallel Execution Plans Joe Chang jchang6@yahoo.com .](https://reader035.fdocuments.net/reader035/viewer/2022062408/56649f0e5503460f94c22f08/html5/thumbnails/47.jpg)
Additional InformationAdditional Information
www.sql-server-performance.com/joe_chang.asp
SQL Server Quantitative Performance AnalysisSQL Server Quantitative Performance AnalysisServer System ArchitectureServer System ArchitectureProcessor PerformanceProcessor PerformanceDirect Connect Gigabit NetworkingDirect Connect Gigabit NetworkingParallel Execution PlansParallel Execution PlansLarge Data OperationsLarge Data OperationsTransferring StatisticsTransferring StatisticsSQL Server Backup Performance with Imceda LiteSpeedSQL Server Backup Performance with Imceda [email protected]