SSIS Best Practices
Transcript of SSIS Best Practices
Bob Duffy
Irish SQL Academy 2008. Level 300
DTS 2000
1.75 Developers
SSIS 2005
*Figures are only approximations and should not be referenced or quoted
• Minimize staging (else use RawFiles if possible)
• Hardware Infrastructure: Disks, RAM, CPU, Network
• SQL Infrastructure: File Groups, Indexing, Partitioning
Optimize and Stabilize the basics
• Replace destinations with RowCount
• Source->RowCount throughput
• Source->Destination throughputMeasure
• OVAL performance tuning strategy
• The Three S‟s
• Data Flow Bag of TricksTune
• Lookup patterns
• Script vs custom transformParallelize
• Increase the efficiency of every aspect
Sharpen
• Parallelize, partition, pipelineShare
• Buy faster, bigger, better hardwareSpend
But be aware of limitations
RowBased
(synchronous)
Partially Blocking
(asynchronous)
Blocking(asynchronous)
http://msdn.microsoft.com/en-us/library/ms345346.aspx
4 Gb Fiber Channel
Destination server runs SQL
Server
DatabaseEMC CX3-80
Destination server:Unisys ES7000/One32 sockets each with dual core Intel 3.4 GHz CPUs256 GB RAMWindows Server 2008SQL Server 2008
1 Gb Ethernet connections
2 Gb Fiber Channel
Source servers run SSIS
Source data EMC CX600
Source servers:Unisys ES3220L2 sockets each with 4 core Intel 2 GHz CPUs4 GB RAMWindows Server 2008SQL Server 2008
Make: Unisys
Model: ES7000/one Enterprise Server
OS: Microsoft Windows Server 2008 x64 Datacenter Edition
CPU: 32 socket dual core Intel® Xeon 3.4 GHz (7140M)
RAM: 256 GB
HBA: 8 dual port 4Gbit FC
NIC: Intel® PRO/1000 MT Server Adapter
Database: Pre-release build of SQL Server 2008 Enterprise Edition (V10.0.1300.4)
Storage: EMC Clariion CX3-80 (Qty 1)
11 trays of 15 disks; 165 spindles x 146 GB 15Krpm; 4Gbit FC
Quantity: 4
Make: Unisys
Model: ES3220L
OS: Windows2008 x64 Enterprise Edition
CPU: 2 socket quad core Intel® Xeon processors @ 2.0GHz
RAM: 4 GB
HBA: 1 dual port 4Gbit Emulex FC
NIC: Intel PRO1000/PT dual port
Database: Pre-release build of SQL Server 2008 Integration Services (V10.0.1300.4)
Storage: 2x EMC CLARiiON CX600 (ea: 45 spindles, 4 2Gbit FC)
C1
C1
C1
C1
Orders TablePartition
1Partition
2Partition
3Partition
4Partition
5Partition
6Partition
55Partition
56
Orders_1 Orders_2 Orders_3 Orders_4 Orders_5 Orders_6 Orders_55 Orders_56. . .
SSIS
SSIS
SSIS
SSIS
SSIS
SSIS
SSIS
SSIS. . .
orders.tbl.1 orders.tbl.2 orders.tbl.3 orders.tbl.4 orders.tbl.5 orders.tbl.6 orders.tbl.55 orders.tbl.56
(Package details removed to protect the innocent)
• Iterative design, development & testingFollow Microsoft
Development Guidelines
• People & Processes
• Kimball‟s ETL and SSIS books are an excellent referenceUnderstand the Business
• Resource contention, processing windows, …
• SSIS does not forgive bad database design
• Old principles still apply – e.g. load with/without indexes?
Get the big picture
• Will this run on IA64 / X64?
• No BIDS on IA64 – how will I debug?
• Is OLE-DB driver XXX available on IA64?
• Memory and resource usage on different platforms
Platform considerations
• Break complex ETL into logically distinct packages (vsmonolithic design)
• Improves development & debug experience
Process Modularity
• Separate sub-processes within package into separate Containers
• More elegant, easier to develop
• Simple to disable whole Containers when debugging
Package Modularity
• Use Script Task/Transform for one-off problems
• Build custom components for maximum re-use
Component Modularity
Concise naming conventions
Conformed “blueprint” design patterns
Presentable layout
Annotations
Error Logging
Configurations
Get as close to the data as possible
• Limit number of columns
• Filter number of rows
Don‟t be afraid to leverage TSQL
• Type conversions, null coercing, coalescing, data type sharpening
• select nullif(name, „‟) from contacts order by 1
• select convert(tinyint, code) from sales
Performance Testing & Tuning
• Connect Output to RowCount transform
• See Performance Best Practices
„FastParse‟ for text files
BEFORE:
selectdbo.Tbl_Dim_Store.SK_Store_ID
, Tbl_Dim_Store.Store_Num,isnull(dbo.Tbl_Dim_Merchant_Division.SK_Merch_Di
v_ID, 0) as SK_Merch_Div_IDfrom dbo.Tbl_Dim_Storeleft outer join dbo.Tbl_Dim_Merchant_Division
on dbo.Tbl_Dim_Store.Merch_Div_Num = dbo.Tbl_Dim_Merchant_Division.Merch_Div_Num
where Current_Row = 1
AFTER:
select * from etl.uf_FactStoreSales(@Date)
Use the power of TSQL to clean the data 'on the fly'
• Too many moving parts is inelegant and likely slow
• But don‟t be afraid to experiment – there are many ways to solve a problem
Avoid over-design
• Allocate enough threads
• EngineThreads property on DataFlow Task
• See Performance Talk
Maximize Parallelism
• Synchronous vs. Asynchronous components
• Memcopy is expensive
Minimize blocking
• For example, minimize data retrieved by LookupTxMinimize
ancillary data
• Full Cache – for small lookup datasets
• No Cache – for volatile lookup datasets
• Partial Cache – for large lookup datasets
Three Modes of Operation
• Full Cache is optimal, but uses the most memory, also takes time to load
• Partial Cache can be expensive since it populates on the fly using singleton SELECTs
• No Cache uses no memory, but takes longer
Tradeoff memory vs.
performance
• Catch is that it requires Sorted inputs
• See SSIS Performance white paper for more details
Can use Merge Join component
instead
• Can written in any .Net language
• Must be signed, registered and installed – but can be widely re-used
• Quite fiddly for single task
Custom components
• Can be written in VisualBasic.Net or C#
• Are persisted within a package – and have limited reuse
• Have template methods already created for you
Scripts
http://sqlcat.com
http://technet.microsoft.com/en-us/library/bb961995.aspx
http://blogs.msdn.com/sqlperf/archive/2008/02/27/etl-world-record.aspx