EMC Deduplication Fundamentals

16
1 Copyright 2011 EMC Corporation. All rights reserved. Deduplication Fundamentals

description

Deduplication reduces the amount of disk storage needed to retain and protect data by ratios of 10-30x and greater, making a disk a cost-effective alternative to tape. Data on disk is available online and onsite for longer retention periods, and restores become fast and reliable. Storing only unique data on disk also means that data can be cost-effectively replicated over existing networks to remote sites for disaster recovery and consolidated tape operations.

Transcript of EMC Deduplication Fundamentals

Page 1: EMC Deduplication Fundamentals

1© Copyright 2011 EMC Corporation. All rights reserved.

Deduplication Fundamentals

Page 2: EMC Deduplication Fundamentals

2© Copyright 2011 EMC Corporation. All rights reserved.

Data Domain BasicsEasy integration with existing environment

Replication

CIFS, NFS, NDMP, DD

Boost

Ethernet

Virtual Tape Library (VTL)

over Fibre Channel

DD890 appliance

Control Tier Target Tier Disaster Recovery Tier

2U 2 to 10 ports 10 and 1 Gigabit Ethernet; 8 Gb/s Fibre Channel RAID 6 Up to 285 TB usable capacity with shelves 2 TB or 1 TB 7.2K rpm SATA HDD in shelf File system NVRAM N+1 fans and redundant, hot-plug power supplies

DD890 appliance

Backup and Archive

Applications

EMC

Symantec

CommVault

IBM

BakBone

Software

Vizioncore

Page 3: EMC Deduplication Fundamentals

3© Copyright 2011 EMC Corporation. All rights reserved.

Second Friday Full Backup

B C D E F L G H

Data Deduplication: Technology OverviewStore more backups in a smaller footprint

A B C D E F G H I J

Friday Full Backup

A B C D A E F G

Mon Incremental A B H

Tues Incremental C B I

Thurs Incremental A C K

Weds Incremental E G J

Backup Estimated Data Logical Reduction Physical

Monday Incremental 100 GB 7–10x 10 GB

Tuesday Incremental 100 GB 7–10x 10 GB

K L

Wednesday Incremental 100 GB 7–10x 10 GB

Thursday Incremental 100 GB 7–10x 10 GB

Second FRIDAY FULL 1 TB 50–60x 18 GB

TOTAL 2.4 TB 7.8x 308 GB

FRIDAY FULL 1 TB 2–4x 250 GB

Page 4: EMC Deduplication Fundamentals

4© Copyright 2011 EMC Corporation. All rights reserved.

Retain: Store More for Longer with LessOver one year of retention in 3U of Data Domain deduplication storage

Week 1

Backup Cumulative Estimated PhysicalData Logical Reduction

April 14 3.8 TB 10x 366 GB

April 21 5.2 TB 12x 424 GB

April 28 6.6 TB 14x 482 GB

May 31 12.2 TB 17x 714 GB

June 30 17.8 TB 19x 946 GB

TOTAL 23.4 TB 20x 1,178 GB

April 7 2.4 TB 8x 308 GB

Week 2

Week 3

Month 1

Month 2

Month 3

Month 4 July 31 23.4 TB 20x 1,178 GB

First Full 1 TB 4x 250 GB

Page 5: EMC Deduplication Fundamentals

5© Copyright 2011 EMC Corporation. All rights reserved.

Data Integrity: Data Invulnerability Architecture

OtherRAID 6NVRAMSnapshots

End-to-end data verificationChecksumDeduplication, write to diskVerify

Self-healing file systemCleaningExpired dataDefragVerify

Deduplication

Local Compression

RAID

File System

GenerateChecksum

VerifyData Verify the file

system metadata integrity

Verify user data integrity

Verify stripe integrity

End-to-end data verification

Page 6: EMC Deduplication Fundamentals

6© Copyright 2011 EMC Corporation. All rights reserved.

Network-Efficient Replication for True Disaster RecoveryLowers WAN costs; improves service level agreements

95–99% cross-site bandwidth reduction

Source:Remote sites

Destination:Data Center Hub Supports hundreds

of remote sites

1–5%

1–5%

1–5%

Archive data

Backup data

Data Domain Global Deduplication Array

Data Domain system

Flexible replication

One-to-many Many-to-one Bi-directional System-to-

system Cascaded

Home

DB

WAN

Home

Data Domain system

Data Domain system

Page 7: EMC Deduplication Fundamentals

7© Copyright 2011 EMC Corporation. All rights reserved.

DD Boost Software• Distributes parts of deduplication process to

backup server or application clients– Licensable software works across Data Domain portfolio

• Supports majority of backup software market– EMC Avamar and NetWorker– Symantec NetBackup and Backup Exec

• Speeds backups by up to 50 percent

• Process more backups with existing resources– 20–40 percent less overall impact to backup server– 80–99 percent less LAN bandwidth

• Enables Data Domain replication management from the backup application

DD Boost

Page 8: EMC Deduplication Fundamentals

8© Copyright 2011 EMC Corporation. All rights reserved.

Data Domain Replicator• Network-efficient and

encrypted

• Transfers only compressed, deduplicated data over the WAN

• Consolidate up to 270 remote

sites into a single system

Additional Data Domain Software Options

Data Domain Virtual Tape Library• Easily integrates with Fibre

Channel

• Emulates multiple tape libraries

• Supports open systems and IBM i operating environments

Data Domain Encryption• Inline encryption of data at

rest

• Satisfies internal governance rules and compliance regulations

• Protects against theft or loss of a physical system

Data Domain Retention Lock• File locking to satisfy IT

governance and compliance policies

• Electronic data shredding

Page 9: EMC Deduplication Fundamentals

9© Copyright 2011 EMC Corporation. All rights reserved.

DD Archiver OverviewCost-optimized long-term retention

• Data Domain system for backup and archive– Active tier: short-term data protection; less than 90

days– Archive tier: scalable long-term retention; multiple

years

• High-throughput deduplication storage– Up to 9.8 TB/hr

• Cost optimized for long-term retention– Up to 570 TB usable, 28.5 PB logical capacity– Low cost per gigabyte while maintaining high

throughput– Fault isolation of archive units for long-term

recoverability

• Leverage existing Data Domain system advantages– Supports DD Replicator and DD Retention Lock software

options– Data Domain Data Invulnerability Architecture to

ensure data integrity

Page 10: EMC Deduplication Fundamentals

10© Copyright 2011 EMC Corporation. All rights reserved.

Industry’s Most Scalable Inline Deduplication Systems

DD140 DD610 DD630 DD670 DD860 DD890Global Deduplication Array

DD Archiver

Speed (DD Boost) 490 GB/hr 1.3 TB/hr 2.1 TB/hr 5.4 TB/hr 9.8 TB/hr 14.7 TB/hr 26.3 TB/hr 9.8 TB/hr

Speed (other) 450 GB/hr 675 GB/hr 1.1 TB/hr 3.6 TB/hr 5.1 TB/hr 8.1 TB/hr 10.7 TB/hr 4.3 TB/hr

Logical capacity 9–43 TB 40–195 TB

84–420 TB 0.6–2.7 PB 1.4–7.1 PB

2.9–14.2 PB 5.7–28.5 PB

5.7–28.5 PB

Raw capacity 1.5 TB Up to 6 TBUp to 12

TB Up to 76 TBUp to 192 TB

Up to 384 TB Up to 768 TB

Up to 768 TB

Usable capacity 0.86 TB

Up to 3.98 TB

Up to 8.4 TB

Up to 55.9 TB

Up to 142 TB

Up to 285 TB Up to 570 TB

Up to 570 TB

Software options:DD Boost, DD Virtual Tape Library, DD Replicator, DD Retention Lock, and DD Encryption

DD140 RemoteOffice Appliance

DD600 Appliance Series

DD ArchiverGlobal Deduplication Array

DD800Appliance Series

Page 11: EMC Deduplication Fundamentals

11© Copyright 2011 EMC Corporation. All rights reserved.

Deduplication Storage Evaluation Criteria

Page 12: EMC Deduplication Fundamentals

12© Copyright 2011 EMC Corporation. All rights reserved.

Methodology: Inline versus Post-Process Deduplication

POST- PROCESSDeduplication After Storing

The more processes, the more resource contention

− Copy to tape: Too slow to stream tape− Recovery: Service level agreement

predictability− Replication: Poor time-to-disaster-recovery− Deduplication: If interleaved with backup or

restore

More administration to fight these issues

DeduplicationStore

3x disk accesses to shared store

Other activities unimpeded

− Predictable− Simpler

INLINEDeduplication Before Storing

Deduplication

Page 13: EMC Deduplication Fundamentals

13© Copyright 2011 EMC Corporation. All rights reserved.

Performance: CPU-Centric versus Spindle-Bound

Thro

ughpu

t M

B/s

50

6,000

Number of Disk Spindles

50 100 150 200

Data Domain

Fibre Channel SATA

Mostdeduplication

vendors

Page 14: EMC Deduplication Fundamentals

14© Copyright 2011 EMC Corporation. All rights reserved.

Data Domain Systems TrajectoryData Domain SISL Scaling Architecture: CPU-centric

Th

rou

gh

pu

t G

B/s

1.5

0.04

5

3

DD Boost

2004 Future2010

2014 (est.)

DD200 (2004)

Improvement since 2004:Throughput: ~175xCapacity: ~450x

Single-controller, s

tandard

protocols

2011

Dual-controlle

r

Global Deduplicatio

n

Array

Page 15: EMC Deduplication Fundamentals

15© Copyright 2011 EMC Corporation. All rights reserved.

Why Data Domain?• Less disk to resource, less to manage

– CPU-centric deduplication– Inline deduplication

• Simple, mature, and flexible– Simple, mature appliance– Any fabric, any software, backup or archive

applications

• Resilience and disaster recovery– Storage of last resort– Fast time-to-DR readiness– Cross-site global compression

• Data center or remote office

Page 16: EMC Deduplication Fundamentals

16© Copyright 2011 EMC Corporation. All rights reserved.

Why EMC Global Services ?Save money • Significantly lower implementation and operating expenditures• Fill internal resource gaps for less • Protect investments in EMC solutions

Accelerate time to value• Reduce deployment time• Accelerate return on investment for new projects• Ease the burden of compliance while protecting critical business

information

Mitigate risk and get better results• Configure the solution to meet your requirements• Improve service levels; reduce management costs• EMC best practices and unmatched product expertise = superior

customer experience• Reduce disruption while taking advantage of the features and

benefits of the latest EMC products and solutions