Download - Primary Data Deduplication in Windows Server 2012 - SNIA · Primary Data Deduplication in Windows Server 2012 ... 1010 1010 0101 0000 0000 1010 0100 1010 1010 0101 ... Primary data

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Primary Data Deduplication in Windows Server 2012

Sudipta Sengupta & Jim Benton Microsoft Corporation Redmond, WA, USA

1


File-based Storage Market

Growth in file-based data 60% growth 2001-2015

[IDC] Rising storage TCO despite

declining disk drive cost Increasing access to data

over low bandwidth links 60% of IWs working from

branch, road, or home Data access increasingly

over WAN

Source: IDC Enterprise Disk Storage Consumption Model


Major players

EMC/DataDomain Backup data dedup $2.1B acquisition

NetApp Backup and Primary data dedup

Dell/Ocarina Extensive file format aware optimization $150M acquisition

Permabit Backup and Primary data dedup

Oracle/ZFS Native dedup support in ZFS

Linux/SDFS Open source Linux Dedup project

PAGE 3

Customer/Market response

Trends: • Package Dedup in a Storage appliance

offering • Move from Backup Primary Live

data


Deduplication of Storage

Detect and remove duplicate data in storage systems Data at-rest: Storage space savings Data on-wire: Network bandwidth and disk I/O

savings High-level techniques

Content based chunking, detect/store unique chunks only

Similarity based, differential (delta) encoding


Content based Chunking

Calculate Rabin fingerprint hash for each sliding window (16 byte) 1010 1010 0101 0000 0000 1010 0100 1010 1010 0101 0101 0101 0101 0011 0101




-4 -2 0 2 4

0 2 4 6

Hash




-4 -2 0 2 4

0 2 4 6

Hash Declare a chunk boundary

If Hash matches a particular pattern,




-4 -2 0 2 4

0 2 4 6

Hash

3 Chunks

Declare a chunk boundary

If Hash matches a particular pattern,


Deduplication of Storage

Detect and remove duplicate data in storage systems Data at-rest: Storage space savings Data on-wire: Network bandwidth and disk I/O

savings High-level techniques

Content based chunking, detect/store unique chunks only

Similarity based, differential (delta) encoding On what type of data?

Backup data: mature market Primary data: relatively recent interest


Deduplication Problem Space

Slide 19

Primary Storage

Concurrent Data Serving

Commodity Hardware

Low Duplication Locality

Primary Data dedup Backup Data dedup


Key Challenge

Slide 20

Because Windows Server needs to run well on commodity hardware and because deduplication has to be friendly to the primary data serving workload, this introduces unique challenges in terms of balancing server resource usage, deduplication throughput, and deduplication space savings.


Key Design Decisions

Post-processing deduplication Preserve latency/throughput of primary data access Flexibility in scheduling dedup as background job on cold data

Deduplication granularity and data chunking Chunk-level: variable sized chunking, large chunk size (~80KB) Modifications to Rabin fingerprint based chunking to achieve more

uniform chunk size distribution

Deduplication resource usage scaling slowly with data size Reduced chunk metadata RAM frugal chunk hash index Data partitioning

Crash consistent on all hardware Detect, Contain, Repair, Report user and metadata corruptions

22


Large Scale Study of Primary Datasets

Used to drive key design decisions

About 7TB of data spread across 15 globally distributed servers in a large enterprise

Data crawled and chunked at different average chunk sizes Using Rabin fingerprint

based variable sized chunker

23

SHA-1 hash, size, compressed size, offset in file, file information logged for each chunk








24


Deduplication Granularity: Whole-file vs. Sub-file

Significant improvement in dedup space savings with sub-file dedup vs. whole-file dedup Numbers obtained using average chunk size of ~64KB

Slide 25


Average Chunk Size

Compression compensates for savings decrease with higher chunk size Compression is

more efficient on larger chunks

Use larger chunk size of ~64KB

Dedup savings loss w/o compression

Dedup savings preserved w/ compression

GFS-US dataset

26

Without sacrificing dedup savings Reduce chunk metadata in the system


Chunk compression

Slide

GFS-US dataset Compression/decompression can have a significant perf impact

Compression savings skewed 50% of unique chunks responsible

for 80% of compression savings 25% of chunks don’t compress at

all

Solution: Selective compression Reduces cost of compression for large fraction of chunks while

preserving most of compression savings Reduces decompression costs (reduce CPU pressure during heavy

reads) Heuristics for determining which chunks should not be compressed


Skewed chunk size distribution Small chunk size

increased chunk metadata in the system

Large chunks reduced dedup savings, benefit of caching

Forced chunk boundaries Forced boundary at max chunk size is content independent, hence

may reduce dedup savings

Basic version of fingerprint based chunking

x

x

Freq

uenc

y di

strib

utio

n Chunk size min max

29


Regression Chunking Algorithm

Goal 1: To obtain uniform chunk size distribution Goal 2: Reduce forced chunk boundaries at max

size Basic idea

When max chunk size is reached, relax match condition to some suffix of bit pattern P

Match |P| - i bits of P, with decreasing priority for i = 0,1, …, k

Reduces probability of forced boundary at max size 2x10-3 for k=1, 10-14 for k=4

30


Regression Chunking: Uniform Chunk Size Distribution

Slide 32

GFS-US dataset

GFS-US dataset

Peak at max size reduced

Uniform chunk size distribution


Regression Chunking: Dedup Savings Improvement

Slide 33

GFS-US dataset








34


The Chunk Indexing Problem

Chunk metadata too big to fit in RAM 48 bytes per chunk: 32-byte SHA hash + 16-byte

location 20TB of unique data, average 64KB chunk size =>

15GB of RAM No locality in key space for chunk hash lookups Our approach

Log-structured organization of chunk metadata on secondary storage

Specialized hash table in RAM to index chunks in log; cuts down RAM footprint by 8x

Prefetch cache for chunk metadata

Slide 35


Chunk Index Architecture

Slide 37

Chunk metadata organized in log-structured manner

Chunk metadata indexed in RAM using a specialized space efficient hash table

RAM write buffer for chunk mappings in currently open session

Prefetch cache for chunk metadata in RAM for sequential predictability of chunk lookups

Chunk Store

DISK or


Data Partitioning and Reconciliation

Two-phase deduplication Divide the data into disjoint partitions, and perform

deduplication within each partition Reconcile duplicates across partitions

Slide 39


Data Partitioning: Divide-and-Conquer

Partition data (e.g., by file type) Deduplicate each partition separately Selective and/or delayed reconciliation of partitions

Slide 40

. . . . . .

Partitions files by type (extension)


Partitioning by file path Partition by directory sub-trees (each partition ≤ 10% of

total bytes) Not as effective as partitioning by file type (for dedup

savings) Partitioning by system/volume

Efficient partitioning strategies

How close is dedup savings within partitions to that of global dedup?

Partitioning by file type Dedup savings almost as

good as with global dedup

Slide 41

Dedup amenable to partitioned processing


Data Partitioning and Reconciliation

Reconciliation algorithm Iterative procedure Grow the set of reconciled partitions by considering

some number of unreconciled partitions at a time

Slide 42


Reconciliation of Data Partitions

Slide 43

Reconciled partitions Unreconciled partitions (k) (indexed in RAM)

Compare unreconciled partitions with each partition in reconciled set

k = #unreconciled partitions considered per iteration; provides trade-off between memory usage and reconciliation speed








45


System Overview

Deduplication pipeline Post-processing Data chunking Index lookups + insertions Chunk Store insertions

Data path Dedup filter Chunk cache File stub tx update

Maintenance jobs Garbage collection (in Chunk Store) Data scrubbing

46


Stream Map

Data Chunk

Phase I – Identify the duplicate data 1. Scan files according to policy 2. Chunk files intelligently to maximize recurring chunks 3. Identify common data chunks

Phase II – Optimize the target files 4. Single-instance data chunks in file stream order 5. Create stream map metadata 6. Truncate original file data stream

Deduplication & on-disk structures

47


Container Files: Scale: Avoid performance cost

of large number of small files Simplicity: Self-contained

volume

Optimization Performance Efficient append only chunk

insertion (no explicit ref-counts)

Efficient hash index load via sequential read

Rehydration Performance

Storing in stream order reduces fragmentation

Reliable Chunk ID reduces seeks during rehydration

Chunk Store file layout for efficient Deduplication & Rehydration

48


Post Write File Layout

Pre write File Layout

Write path to Optimized File

1

2 Write flow

3

Deduplicated File Write Path Partial Recall

1. Hold User Write IO

2. Deduped Chunk View

3. Fixed Recall Boundaries

4. Read Underlying Chunks

5. Write Recall-Aligned Data and Flush File

6. Write Reparse Buffer with Recall Bitmap

7. Release User Write (no explicit file flush)

Reduce latency for small writes to large files (e.g. OS image patching scenario)

Incremental Dedup will later optimize the new data

Recall granularity grows with file size

GC cleans up unreferenced chunks (chunk “D” in example)

49


Crash Consistency State Diagram Partial Recall

51


Data Scrubbing: Analysis

Assumptions: 1TB Volume containing 250,000 files with a 35% Dedup ratio Resulting Dedup on-disk structures: 10 x 100MB Stream Map containers. 625 x 1GB Chunk Containers

Resulting User Data

Loss

Impact Locality: Within

Files

Impact Surface Area:

Dedup on-disk files

Risks

Silent Data Corruption

Container Header

Stream Map Container 10,000s of files

Chunk Container 100s of files

Container Body

Stream Map Container <= 1 file

Chunk Container

Most Chunks: 1-99 files

Popular Chunk: 100-1000s of files

Other Metadata

Delete Log

Hotspot Table

No Impact

Latent Sector Error

Invest in Early detection Containment Resiliency Repair Reporting

Focus on where Dedup adds significant risk over the File System (Red boxes)

52


Data Scrubbing: Approach Checksum & Redundancy

Detection , Containment & Resiliency

Scrubbing & Repair Reporting

Checksum all data and metadata

Extra copies of container headers

Mirror most popular chunks

Checksum validation on data access, optimization and jobs

Log internal corruption event for scrubbing job to handle

Retry on checksum mismatch (transient errors)

Fail operation or skip container and continue if possible

Never GC or delete data in corrupted container

Regular validation of critical data and metadata

Repair from redundant copy if available

• Redundant header

• Hotspot chunks • Storage Spaces

mirror • Future chunks

Informative, actionable reporting at file level (No chunks, offsets, containers)

Report corruption repairs when able

Advise restore of specific file(s) from backup.

53


Performance: Deduplication Throughput

Quad-core Intel Xeon 2.27GHz machine, 12GB RAM Four scenarios, from combinations of

Index type (pure in-memory vs. memory/disk) Data partitioning (off or on)

Deduplication throughput 25-30 MB/s (single thread performance) Only about 10% decrease from baseline to least memory case Three orders of magnitude higher than typical data ingestion

rates of 0.03 MB/sec (Leung, et al.)

GFS-US dataset

54


Performance: Resource Usage

RAM frugality Index memory usage

reduction of 24x vs. baseline

Low CPU utilization 30-40% per core Enough room available for

serving primary workload in multi-core modern file servers

Low disk usage Median disk queue depth is zero in all cases At 75-th percentile, increase by 2-3; impact of index lookups going

to disk and/or reconciliation

55


Performance: Data Access

Scenarios: • 2% to 3% on office file

access • 0.5-1.5x on VHD copy

• 1.3x on VHD update

1

2

3

1

2 3

56


… game-changers like deduplication …

Windows Server 8 includes storage deduplication. Microsoft demonstrations of the technology reduced the disk footprint of a VDI server by some 96 percent. Microsoft's approach is also different from that of a virtualization system like VMware, insofar as it's not restricted to virtualization workloads. File servers too can benefit from the technology. The result should be both more robust and more effective …

Dedup among top Windows Server 2012 features being talked about

Slide 57

Top 10: New Features in Windows Server 8

Why Windows 8 server is a game-changer Windows Server 8: built for the cloud, built for virtualization


Summary

Primary data deduplication in Windows Server 2012 Extending the problem frontier in multiple dimensions w.r.t. backup

data deduplication

Design decisions driven by large scale data analysis 7TB of data across 15 globally distributed servers in a large

enterprise

Primary workload friendly Scale deduplication processing resource usage with data size Data chunking and compression Chunk indexing Data partitioning and reconciliation

Robust and reliable on commodity hardware Detect, Contain, Repair, Report user and metadata corruptions 58


Appendix Slides

59


Optimized Data view

After Dedup:

Before Dedup:

10TB

Non-Optimized Files

Chunk Store

Data Volume with 10TB of data with 80% (5:1) deduplication savings

Optimized files (Stubs)

80% savings also lead to smaller/faster: Backup/Restore, Archive, Migration What do you do with the savings? (8TB of space)

Add more data Buy less hardware Increase availability by adding redundancy (RAID or Windows Storage Spaces)

60


Core Deduplication Architecture

1 Pipeline: coordinates deduplication & compression of files in policy/scope Hash Index: indexes strong signatures for each unique chunk in the dataset Chunk Store: stores chunks & stream map metadata Mini-Filter: processes all IO directed to optimized files

2

3

4

1

2

3

4

61


Low RAM Usage: Cuckoo Hashing

High hash table load factors while keeping lookup times fast Collisions resolved using cuckoo hashing Key can be in one of K candidate positions Later inserted keys can relocate earlier keys to

their other candidate positions K candidate positions for key x obtained using

K hash functions h1(x), …, hK(x) In practice, two hash functions can simulate K

hash functions using hi(x) = g1(x) + i*g2(x)

System uses value of K=16 and targets 90% hash table load factor

Insert X

62


Low RAM Usage: Compact Key Signatures

Compact key signatures stored in hash table 2-byte key signature (vs. 20-byte SHA-1 hash) Key x stored at its candidate position i derives its signature from

hi(x) False flash read probability < 0.01%

Total 6-10 bytes per entry (4-8 byte flash pointer)

Related work on key-value stores on flash media MicroHash, FlashDB, FAWN, BufferHash

63


Performance: Parallelizability

Parallel processing across datasets and CPU cores/disks Disk diverse datasets

One session per volume in current implementation One CPU core allocated per dedup session

One process and thread per deduplication session No cross-dependencies in deduplication sessions (each session uses a

separate index) Aggregate dedup throughput scales as expected with number of cores

(provided sufficient RAM is available) Workload scheduler

Assigns jobs (deduplication, GC, scrubbing) with CPU cores Allocates memory per job Keeps track of job activity (cancel jobs on memory or CPU pressure)

64


Garbage Collection

Clean up unreferenced chunks arising from User deleted data Crash after chunk store insertion but before metadata update

Identify zero ref count chunks Create bloom filter (BF) for all referenced chunks Shallow GC: Check which user deleted chunks exist in BF Deep GC: Build logical maps per chunk store container