EMC Deduplication Fundamentals
-
Upload
emcbaltics -
Category
Technology
-
view
3.059 -
download
1
description
Transcript of EMC Deduplication Fundamentals
1© Copyright 2011 EMC Corporation. All rights reserved.
Deduplication Fundamentals
2© Copyright 2011 EMC Corporation. All rights reserved.
Data Domain BasicsEasy integration with existing environment
Replication
CIFS, NFS, NDMP, DD
Boost
Ethernet
Virtual Tape Library (VTL)
over Fibre Channel
DD890 appliance
Control Tier Target Tier Disaster Recovery Tier
2U 2 to 10 ports 10 and 1 Gigabit Ethernet; 8 Gb/s Fibre Channel RAID 6 Up to 285 TB usable capacity with shelves 2 TB or 1 TB 7.2K rpm SATA HDD in shelf File system NVRAM N+1 fans and redundant, hot-plug power supplies
DD890 appliance
Backup and Archive
Applications
EMC
Symantec
CommVault
IBM
BakBone
Software
Vizioncore
3© Copyright 2011 EMC Corporation. All rights reserved.
Second Friday Full Backup
B C D E F L G H
Data Deduplication: Technology OverviewStore more backups in a smaller footprint
A B C D E F G H I J
Friday Full Backup
A B C D A E F G
Mon Incremental A B H
Tues Incremental C B I
Thurs Incremental A C K
Weds Incremental E G J
Backup Estimated Data Logical Reduction Physical
Monday Incremental 100 GB 7–10x 10 GB
Tuesday Incremental 100 GB 7–10x 10 GB
K L
Wednesday Incremental 100 GB 7–10x 10 GB
Thursday Incremental 100 GB 7–10x 10 GB
Second FRIDAY FULL 1 TB 50–60x 18 GB
TOTAL 2.4 TB 7.8x 308 GB
FRIDAY FULL 1 TB 2–4x 250 GB
4© Copyright 2011 EMC Corporation. All rights reserved.
Retain: Store More for Longer with LessOver one year of retention in 3U of Data Domain deduplication storage
Week 1
Backup Cumulative Estimated PhysicalData Logical Reduction
April 14 3.8 TB 10x 366 GB
April 21 5.2 TB 12x 424 GB
April 28 6.6 TB 14x 482 GB
May 31 12.2 TB 17x 714 GB
June 30 17.8 TB 19x 946 GB
TOTAL 23.4 TB 20x 1,178 GB
April 7 2.4 TB 8x 308 GB
Week 2
Week 3
Month 1
Month 2
Month 3
Month 4 July 31 23.4 TB 20x 1,178 GB
First Full 1 TB 4x 250 GB
5© Copyright 2011 EMC Corporation. All rights reserved.
Data Integrity: Data Invulnerability Architecture
OtherRAID 6NVRAMSnapshots
End-to-end data verificationChecksumDeduplication, write to diskVerify
Self-healing file systemCleaningExpired dataDefragVerify
Deduplication
Local Compression
RAID
File System
GenerateChecksum
VerifyData Verify the file
system metadata integrity
Verify user data integrity
Verify stripe integrity
End-to-end data verification
6© Copyright 2011 EMC Corporation. All rights reserved.
Network-Efficient Replication for True Disaster RecoveryLowers WAN costs; improves service level agreements
95–99% cross-site bandwidth reduction
Source:Remote sites
Destination:Data Center Hub Supports hundreds
of remote sites
1–5%
1–5%
1–5%
Archive data
Backup data
Data Domain Global Deduplication Array
Data Domain system
Flexible replication
One-to-many Many-to-one Bi-directional System-to-
system Cascaded
Home
DB
WAN
Home
Data Domain system
Data Domain system
7© Copyright 2011 EMC Corporation. All rights reserved.
DD Boost Software• Distributes parts of deduplication process to
backup server or application clients– Licensable software works across Data Domain portfolio
• Supports majority of backup software market– EMC Avamar and NetWorker– Symantec NetBackup and Backup Exec
• Speeds backups by up to 50 percent
• Process more backups with existing resources– 20–40 percent less overall impact to backup server– 80–99 percent less LAN bandwidth
• Enables Data Domain replication management from the backup application
DD Boost
8© Copyright 2011 EMC Corporation. All rights reserved.
Data Domain Replicator• Network-efficient and
encrypted
• Transfers only compressed, deduplicated data over the WAN
• Consolidate up to 270 remote
sites into a single system
Additional Data Domain Software Options
Data Domain Virtual Tape Library• Easily integrates with Fibre
Channel
• Emulates multiple tape libraries
• Supports open systems and IBM i operating environments
Data Domain Encryption• Inline encryption of data at
rest
• Satisfies internal governance rules and compliance regulations
• Protects against theft or loss of a physical system
Data Domain Retention Lock• File locking to satisfy IT
governance and compliance policies
• Electronic data shredding
9© Copyright 2011 EMC Corporation. All rights reserved.
DD Archiver OverviewCost-optimized long-term retention
• Data Domain system for backup and archive– Active tier: short-term data protection; less than 90
days– Archive tier: scalable long-term retention; multiple
years
• High-throughput deduplication storage– Up to 9.8 TB/hr
• Cost optimized for long-term retention– Up to 570 TB usable, 28.5 PB logical capacity– Low cost per gigabyte while maintaining high
throughput– Fault isolation of archive units for long-term
recoverability
• Leverage existing Data Domain system advantages– Supports DD Replicator and DD Retention Lock software
options– Data Domain Data Invulnerability Architecture to
ensure data integrity
10© Copyright 2011 EMC Corporation. All rights reserved.
Industry’s Most Scalable Inline Deduplication Systems
DD140 DD610 DD630 DD670 DD860 DD890Global Deduplication Array
DD Archiver
Speed (DD Boost) 490 GB/hr 1.3 TB/hr 2.1 TB/hr 5.4 TB/hr 9.8 TB/hr 14.7 TB/hr 26.3 TB/hr 9.8 TB/hr
Speed (other) 450 GB/hr 675 GB/hr 1.1 TB/hr 3.6 TB/hr 5.1 TB/hr 8.1 TB/hr 10.7 TB/hr 4.3 TB/hr
Logical capacity 9–43 TB 40–195 TB
84–420 TB 0.6–2.7 PB 1.4–7.1 PB
2.9–14.2 PB 5.7–28.5 PB
5.7–28.5 PB
Raw capacity 1.5 TB Up to 6 TBUp to 12
TB Up to 76 TBUp to 192 TB
Up to 384 TB Up to 768 TB
Up to 768 TB
Usable capacity 0.86 TB
Up to 3.98 TB
Up to 8.4 TB
Up to 55.9 TB
Up to 142 TB
Up to 285 TB Up to 570 TB
Up to 570 TB
Software options:DD Boost, DD Virtual Tape Library, DD Replicator, DD Retention Lock, and DD Encryption
DD140 RemoteOffice Appliance
DD600 Appliance Series
DD ArchiverGlobal Deduplication Array
DD800Appliance Series
11© Copyright 2011 EMC Corporation. All rights reserved.
Deduplication Storage Evaluation Criteria
12© Copyright 2011 EMC Corporation. All rights reserved.
Methodology: Inline versus Post-Process Deduplication
POST- PROCESSDeduplication After Storing
The more processes, the more resource contention
− Copy to tape: Too slow to stream tape− Recovery: Service level agreement
predictability− Replication: Poor time-to-disaster-recovery− Deduplication: If interleaved with backup or
restore
More administration to fight these issues
DeduplicationStore
3x disk accesses to shared store
Other activities unimpeded
− Predictable− Simpler
INLINEDeduplication Before Storing
Deduplication
13© Copyright 2011 EMC Corporation. All rights reserved.
Performance: CPU-Centric versus Spindle-Bound
Thro
ughpu
t M
B/s
50
6,000
Number of Disk Spindles
50 100 150 200
Data Domain
Fibre Channel SATA
Mostdeduplication
vendors
14© Copyright 2011 EMC Corporation. All rights reserved.
Data Domain Systems TrajectoryData Domain SISL Scaling Architecture: CPU-centric
Th
rou
gh
pu
t G
B/s
1.5
0.04
5
3
DD Boost
2004 Future2010
2014 (est.)
DD200 (2004)
Improvement since 2004:Throughput: ~175xCapacity: ~450x
Single-controller, s
tandard
protocols
2011
Dual-controlle
r
Global Deduplicatio
n
Array
15© Copyright 2011 EMC Corporation. All rights reserved.
Why Data Domain?• Less disk to resource, less to manage
– CPU-centric deduplication– Inline deduplication
• Simple, mature, and flexible– Simple, mature appliance– Any fabric, any software, backup or archive
applications
• Resilience and disaster recovery– Storage of last resort– Fast time-to-DR readiness– Cross-site global compression
• Data center or remote office
16© Copyright 2011 EMC Corporation. All rights reserved.
Why EMC Global Services ?Save money • Significantly lower implementation and operating expenditures• Fill internal resource gaps for less • Protect investments in EMC solutions
Accelerate time to value• Reduce deployment time• Accelerate return on investment for new projects• Ease the burden of compliance while protecting critical business
information
Mitigate risk and get better results• Configure the solution to meet your requirements• Improve service levels; reduce management costs• EMC best practices and unmatched product expertise = superior
customer experience• Reduce disruption while taking advantage of the features and
benefits of the latest EMC products and solutions