Sunil Agarwal Senior Program Manager Email: [email protected] Microsoft Corporation SESSION CODE:...

32
Microsoft SQL Server Data Compression: Experience and Changes Sunil Agarwal Senior Program Manager Email: [email protected] Microsoft Corporation SESSION CODE: DAT309

Transcript of Sunil Agarwal Senior Program Manager Email: [email protected] Microsoft Corporation SESSION CODE:...

Page 1: Sunil Agarwal Senior Program Manager Email: sunila@microsoft.com Microsoft Corporation SESSION CODE: DAT309.

Microsoft SQL Server Data Compression: Experience and Changes

Sunil AgarwalSenior Program ManagerEmail: [email protected] Corporation

SESSION CODE: DAT309

Page 2: Sunil Agarwal Senior Program Manager Email: sunila@microsoft.com Microsoft Corporation SESSION CODE: DAT309.

Agenda

Customer Experience and FeedbackUnicode Compression in SQL2008R2Future Directions

Page 3: Sunil Agarwal Senior Program Manager Email: sunila@microsoft.com Microsoft Corporation SESSION CODE: DAT309.

Data Compression Overview

Two Types of compressionROW

Fixed length columns stored as variable lengthRecommendation: DML heavy workload

PAGEColumn Prefix and Page Dictionary compressionRecommendation: Read-mostly workload

Can be enabled on a table, index, and partitionEstimate data compression savings by sp_estimate_data_compression_savingsCan be enabled/disabled ONLINENo application changes

Page 4: Sunil Agarwal Senior Program Manager Email: sunila@microsoft.com Microsoft Corporation SESSION CODE: DAT309.

Data Compression and Space Savings

Your mileage will vary.

Page 4

Data Compression Savings Achieved By CustomersCustomer Data Compression Space Savings Notes

Bank Itau 70% PAGE. Data Warehouse application.

BWIN.com 40% PAGE. OLTP Web application.

NASDAQ 62% PAGE. DW application.

GE Healthcare 38%, 21% PAGE, ROW.

Manhattan Associates 80%, 50% PAGE, ROW.

First American Title 52% PAGE.

SAP ERP 50%, 15% PAGE, ROW.

MS Dynamics AX 81% PAGE. ERP application.

ServiceU 35% PAGE.

Page 5: Sunil Agarwal Senior Program Manager Email: sunila@microsoft.com Microsoft Corporation SESSION CODE: DAT309.

Data Compression and Workload Performance

Work Load PerformanceCustomer Performance impact Notes

BWIN.com 5% PAGE compression. OLTP Web application. Large volume of transactions.

NASDAQ 40%-60% PAGE compression. Large sequential range queries . DW Application.

GE Healthcare -1% PAGE compression. 500 users, 1500 Transactions / sec. OLTP with some reporting queries.

Manhattan Associates -11% PAGE compression. A lot of insert, update and delete activity.

First American Title 2% - 3% PAGE compression. OLTP Application.

MS Dynamics AX 3% PAGE compression. ERP application – small transactions.

Page 6: Sunil Agarwal Senior Program Manager Email: sunila@microsoft.com Microsoft Corporation SESSION CODE: DAT309.

Top Customer Question - 1Question: Data compression increases the size of my database?

TB-1 (16 GB)

File Size = 20 GB

CompressedTB-1 (8GB)

File Size = 28 GB

Empty Space (16 GB)

o Suggestions:o Do nothing if the object needs to growo Start by compressing the smaller object firsto Use shrink. But it fragments the datao Bulk export/import into empty compressed table. Data availability?o Moving object to a new filegroup

TB-2 ( 4 GB)

CompTB-22GB

Empty Space(14 GB)

Empty Space4GB

TB-1 (16 GB)

File Size = 20 GB

CompTB-2

(2 GB)

File Size = 22 GB

TB-2 ( 4 GB)

Free Space ( 4 GB)

Comp TB-14 GB

Comp TB-14 GB

Empty Space (16 GB)

File Size = 26 GB

Page 7: Sunil Agarwal Senior Program Manager Email: sunila@microsoft.com Microsoft Corporation SESSION CODE: DAT309.

Top Customer Question - 2

Question: I am not getting any or minimal compression?ROW Compression:

No fixed length columnFixed length columns but all bytes are usedCompressed row > 4K

PAGE CompressionNo column prefix savingsNo common values for page dictionaryLarge row size implying 1 to few rows per page

Mostly LOB data

Page 8: Sunil Agarwal Senior Program Manager Email: sunila@microsoft.com Microsoft Corporation SESSION CODE: DAT309.

Top Customer Question - 3How do I get PAGE compression on a HEAP?

PAGEROW

Problem: Adhoc inserts on a new page will not be PAGE compressed in a HEAPSuggestions

Rebuild HEAP periodically (ONLINE available)Use TABLOCK when bulk importing into a HEAP

R1R2R3R4R5

Header

BTREE PAGE

CI structure

Header

Page 9: Sunil Agarwal Senior Program Manager Email: sunila@microsoft.com Microsoft Corporation SESSION CODE: DAT309.

Top Customer Question - 4

Index RelatedQuestion: It is taking longer to rebuild index or heap

ROW compression takes approx. 1.5 times the CPU time used for rebuilding an indexPAGE compression takes approx. 2 to 5 times the CPU time used for rebuilding an index Your mileage may vary

Question: Do I need to take object offline to enable compression?ONLINE operations supported. Few unique values for the leading column of the index may reduce parallelism. This is similar to regular indexCompressing a heap with ONLINE = ON uses a single CPU for compression (or rebuild)

Page 10: Sunil Agarwal Senior Program Manager Email: sunila@microsoft.com Microsoft Corporation SESSION CODE: DAT309.

Top Customer Question - 5

NONE ROW PAGE0

12

24

36

0

2

4

6

Time to BULK INSERT 50M rows (min) Table Size after Load (GB)

Compression Type

Tim

e (m

inut

es)

Tabl

e Si

ze a

fter

Loa

d (G

B)

Question: What is the impact of compression on Bulk Import?

Page 11: Sunil Agarwal Senior Program Manager Email: sunila@microsoft.com Microsoft Corporation SESSION CODE: DAT309.

Top Customer Question - 6Question: What object(s) should I compress?

Evaluate Compression savings General: DML heavy (ROW) vs Query heavy (PAGE) both for table/partitionDon’t compress all objects in the database without evaluation

If table is relatively small don’t bother compressingConsider compression if table/partition accessed rarely Look at index usage

Used Rarely?Singleton lookup Range Access

Page 12: Sunil Agarwal Senior Program Manager Email: sunila@microsoft.com Microsoft Corporation SESSION CODE: DAT309.

Example: Enabling CompressionUnpartitioned table

Table

Index

PAGE Compressed

Index

Uncompressed

Table

Index

Page 13: Sunil Agarwal Senior Program Manager Email: sunila@microsoft.com Microsoft Corporation SESSION CODE: DAT309.

Example: Enabling CompressionLatest partition uncompressed

Jan-Mar Apr-June July-Sept Oct-Dec

PAGE Compressed

Uncompressed

ROW Compressed

Page 14: Sunil Agarwal Senior Program Manager Email: sunila@microsoft.com Microsoft Corporation SESSION CODE: DAT309.

Customer Example: An SAP Deployment

Table Compression Strategies

TableSize (GB)

ROW save %

PAGE save %

1-row Read

>1-row Read Update Delete Insert

% Scan % Update Plan Notes

COSP 398 80% 90% 2,797 58,735 886,187 15,747 584,017 3.80% 57.27% ROW Updates!

GLPCA 123 15% 89% 0 929,637 0 16,802 9,020 92.46% 0.0% PAGE Scan mostly

COEP 185 30% 81% 019,0read-mostly

36 2,927 0 48,182 27.14% 4.17% ROWLight use, but stay

low riskRESB 243 38% 83% 9,837 7,977,629 943,380 1,321 14,877 89.16% 10.54% ROW #updatesACCTIT 210 21% 87% 0 0 0 0 54,580 0.00% 0.0% PAGE Append onlyMSEG 183 28% 87% 3,441,918 24,684,252 28 0 70,797 87.54% 0.0% PAGE Scan mostlyFAGLFLEXA 98 29% 88% 0 298 0 0 58,882 0.50% 0.0% PAGE Append onlyBSIS 148 30% 90% 0 9,069 67 5,773 64,366 11.44% 0.08% PAGE Append mostlyCOSB 150 84% 92% 0 88 0 0 0 100.00% 0.00% ROW ROW ~=PAGEGLFUNCA 40 15% 89% 0 6 0 0 0 100.00% 0.00% PAGE Read Only

Inputs: sp_estimate_data_compression_savings, dm_db_physical_index_usage_stats, SAP knowledge Computed: S=% scans; U=% updates

ROW ~= PAGE => ROWHigh Update, Low Scan => ROWHigh Scan => PAGEAppend Only => PAGE

Read-only => PAGE

Page 15: Sunil Agarwal Senior Program Manager Email: sunila@microsoft.com Microsoft Corporation SESSION CODE: DAT309.

Top Customer Question - 7

How do I compress Unicode data?SQL uses UCS-2 encoding scheme.

NCHAR and NVARCHAR data always takes 2 bytes of storage. Waste of 1 byte/char for commonly deployed locales (e.g. ASCII)Existing ROW compression ineffectivePAGE compression only helps for exact match.

Sample representation‘a’ = 0x61 (ASCII) and 0x0061 (UCS-2)

Page 16: Sunil Agarwal Senior Program Manager Email: sunila@microsoft.com Microsoft Corporation SESSION CODE: DAT309.

Agenda

Customer Experience and FeedbackUnicode Compression in SQL2008R2Future Directions

Page 17: Sunil Agarwal Senior Program Manager Email: sunila@microsoft.com Microsoft Corporation SESSION CODE: DAT309.

SQL2008R2: Unicode and Competitive challenge

Most ISVs are switching their customers to the UNICODE version of applications.

CompetitionOracle supports UTF-8 encoding for Unicode.

Results in 1 byte storage for ASCII and most EuropeanDB2 provides UTF8 and Unicode compression as well

Page 18: Sunil Agarwal Senior Program Manager Email: sunila@microsoft.com Microsoft Corporation SESSION CODE: DAT309.

SQL2008R2: Unicode Solution

Use standard SCSU compression technique http://www.unicode.org/reports/tr6/tr6-4.html No application change neededCompression Achieved

Comparison of UNICODE compression with SCSU and UTF-8Locale SCSU UTF-8English 0.5 0.5Japanese .85 1.0Korean 1.0 1.0Turkish .52 .53German .5 .5Vietnamese 0.61 0.68Hindi 0.5 1.0

Page 19: Sunil Agarwal Senior Program Manager Email: sunila@microsoft.com Microsoft Corporation SESSION CODE: DAT309.

SQL2008R2: Enabling CompressionEnterprise Edition onlyTypes of data compressions

ROW Stores fixed length values as variable lengthSuperset of vardecimal storage formatRow metadata optimizedBLOB/LOB is not ROW compressedUnicode data is compressed. For most locales 50% savingSupported types NVARCHAR and NCHAR but not NTEXT

PAGE (includes ROW)Column PrefixDictionaryOnly in-row BLOB/LOB can potentially benefit from PAGE compression

Page 20: Sunil Agarwal Senior Program Manager Email: sunila@microsoft.com Microsoft Corporation SESSION CODE: DAT309.

Example: Unicode CompressionCreate Table SQLCOMP (Name NVARCHAR(20))Go

Insert into SQLCOMP values (‘SERVER’)Insert into SQLCOMP values (‘ENGINE’)Insert into SQLCOMP values (‘LOADERS’)

HEADER

0x00530051004C005300450052005600450052 “SQLSERVER”

0x00530051004C0045004E00470049004E0045 “SQLENGINE”

0x00530051004C004C004F00410044004500520053 “SQLLOADERS”

0x53514C534552564552 “SQLSERVER”

0x53514C454E47494E45 “SQLENGINE”

0x53514C4C4F414445525310 “SQLLOADERS”

0x53514C “SQL”

0x03534552564552 “?SERVER”

0x 03454E47494E45 “?ENGINE”

0x034C4F414445525310 “?LOADERS”

Col-prefix

ROW COMPRESSIONPAGE COMPRESSION

Page 21: Sunil Agarwal Senior Program Manager Email: sunila@microsoft.com Microsoft Corporation SESSION CODE: DAT309.

Changes to Estimate Compression Stored Procedure

SQL2008 RTMEstimated compression savings = 0 if compression mode did not change

SQL2008R2Estimated compression savings non-zero if space can be further saved. Useful in

De-fragmentation space savingsUnicode Compression space savings

Page 22: Sunil Agarwal Senior Program Manager Email: sunila@microsoft.com Microsoft Corporation SESSION CODE: DAT309.

Unicode CompressionSunil AgarwalSenior Program ManagerMicrosoft

DEMO

Page 23: Sunil Agarwal Senior Program Manager Email: sunila@microsoft.com Microsoft Corporation SESSION CODE: DAT309.

SQL2008R2: Unicode Compression Results

ROW Compression Savings with UNICODE Compression

Application ROW Compression ROW with UNICODE

SAP ERP Benchmark DB 9% 43%

Dynamics AX 30% 53.2%

**** 45% 64%

**** 30% 45%

Savings on Hardware Cost

Customer Projected Storage Cost Reduction

Microsoft ( MSIT/SAP) $500 K savings

**** $500K

Page 24: Sunil Agarwal Senior Program Manager Email: sunila@microsoft.com Microsoft Corporation SESSION CODE: DAT309.

Upgrade to SQL2008R2Scenarios

ROW compression enabled in SQL2008No database changes when upgradedUnicode value compressed only if it saves space. It happens when

An existing value is updated A new row is insertedIndex is rebuilt with ROW or PAGE compression

PAGE compression enabled in SQL2008Same as with ROW compression

No changes needed to existing scripts and DDL

Page 25: Sunil Agarwal Senior Program Manager Email: sunila@microsoft.com Microsoft Corporation SESSION CODE: DAT309.

Future Directions and ASKs

We are looking intoUnicode Compression for in-row portion for NVARCHAR(MAX)LOB CompressionXML compressionMake sp_estimate* available on all SKUs

Page 26: Sunil Agarwal Senior Program Manager Email: sunila@microsoft.com Microsoft Corporation SESSION CODE: DAT309.

Related Contents

http://sqlcat.com/whitepapers/archive/2009/05/29/data-compression-strategy-capacity-planning-and-best-practices.aspx

www.sqlcat.com

http://blogs.msdn.com/sqlserverstorageengine

http://blogs.msdn.com/sqlcat/

http://blogs.msdn.com/mssqlisv/

http://www.unisys.com/eprise/main/admin/corporate/doc/41371394.pdf

http://search.hp.com/redirect.html?type=REG&qt=sql+server+data+compression&url=http%3A//h71028.www7.hp.com/ERC/downloads/4AA1-8766ENW.pdf%3Fjumpid%3Dreg_R1002_USEN&pos=1

http://www.netapp.com/us/library/technical-reports/tr-3719.html

Page 27: Sunil Agarwal Senior Program Manager Email: sunila@microsoft.com Microsoft Corporation SESSION CODE: DAT309.

DAT Track Scratch 2 Win

Find the DAT Track Surface Table in the Yellow Section of the TLCTry your luck to win a Zune HDSimply scratch the game pieces on the DAT Track Surface Table and Match 3 Zune HDs to win

Page 28: Sunil Agarwal Senior Program Manager Email: sunila@microsoft.com Microsoft Corporation SESSION CODE: DAT309.

Resources

www.microsoft.com/teched

Sessions On-Demand & Community Microsoft Certification & Training Resources

Resources for IT Professionals Resources for Developers

www.microsoft.com/learning

http://microsoft.com/technet http://microsoft.com/msdn

Learning

Page 29: Sunil Agarwal Senior Program Manager Email: sunila@microsoft.com Microsoft Corporation SESSION CODE: DAT309.

Complete an evaluation on CommNet and enter to win!

Page 30: Sunil Agarwal Senior Program Manager Email: sunila@microsoft.com Microsoft Corporation SESSION CODE: DAT309.

Sign up for Tech·Ed 2011 and save $500 starting June 8 – June 31st

http://northamerica.msteched.com/registration

You can also register at the

North America 2011 kiosk located at registrationJoin us in Atlanta next year

Page 31: Sunil Agarwal Senior Program Manager Email: sunila@microsoft.com Microsoft Corporation SESSION CODE: DAT309.

© 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to

be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Page 32: Sunil Agarwal Senior Program Manager Email: sunila@microsoft.com Microsoft Corporation SESSION CODE: DAT309.

JUNE 7-10, 2010 | NEW ORLEANS, LA