Data DeDuplication for Dummies.pdf

download Data DeDuplication for Dummies.pdf

of 43

Transcript of Data DeDuplication for Dummies.pdf

  • 8/10/2019 Data DeDuplication for Dummies.pdf

    1/43

  • 8/10/2019 Data DeDuplication for Dummies.pdf

    2/43

    These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.

  • 8/10/2019 Data DeDuplication for Dummies.pdf

    3/43

    DataDeduplicationFOR

    DUMmIES

    QUANTUM 2ND SPECIAL EDITION

    by Mark R. Coppockand Steve Whitner

    These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.

  • 8/10/2019 Data DeDuplication for Dummies.pdf

    4/43

    Data Deduplication For Dummies, Quantum 2nd Special Edition

    Published by

    Wiley Publishing, Inc.111 River StreetHoboken, NJ 07030-5774

    www.wiley.com

    Copyright 2011 by Wiley Publishing, Inc., Indianapolis, Indiana

    Published by Wiley Publishing, Inc., Indianapolis, Indiana

    No part of this publication may be reproduced, stored in a retrieval system or transmitted in anyform or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise,except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without theprior written permission of the Publisher. Requests to the Publisher for permission should beaddressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ

    07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.

    Trademarks:Wiley, the Wiley Publishing logo, For Dummies, the Dummies Man logo, A Referencefor the Rest of Us!, The Dummies Way, Dummies.com, Making Everything Easier, and related tradedress are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affiliates in theUnited States and other countries, and may not be used without written permission. Quantum andthe Quantum logo are trademarks of Quantum Corporation. StorNext is a registered trademark ofQuantum Corporation. All other trademarks are the property of their respective owners. WileyPublishing, Inc., is not associated with any product or vendor mentioned in this book.

    Figure 3-2 is from an IDC White Paper, sponsored by Quantum, Demonstrating the Business Value ofDeduplication for Data Protection, November 2011.

    LIMIT OF LIABILITY/DISCLAIMER OF WARRANTY: THE PUBLISHER AND THE AUTHOR MAKENO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE ACCURACY OR COMPLETE-NESS OF THE CONTENTS OF THIS WORK AND SPECIFICALLY DISCLAIM ALL WARRANTIES,INCLUDING WITHOUT LIMITATION WARRANTIES OF FITNESS FOR A PARTICULAR PURPOSE.NO WARRANTY MAY BE CREATED OR EXTENDED BY SALES OR PROMOTIONAL MATERIALS.THE ADVICE AND STRATEGIES CONTAINED HEREIN MAY NOT BE SUITABLE FOR EVERY SITU-ATION. THIS WORK IS SOLD WITH THE UNDERSTANDING THAT THE PUBLISHER IS NOTENGAGED IN RENDERING LEGAL, ACCOUNTING, OR OTHER PROFESSIONAL SERVICES. IF PRO-FESSIONAL ASSISTANCE IS REQUIRED, THE SERVICES OF A COMPETENT PROFESSIONALPERSON SHOULD BE SOUGHT. NEITHER THE PUBLISHER NOR THE AUTHOR SHALL BE LIABLEFOR DAMAGES ARISING HEREFROM. THE FACT THAT AN ORGANIZATION OR WEBSITE ISREFERRED TO IN THIS WORK AS A CITATION AND/OR A POTENTIAL SOURCE OF FURTHERINFORMATION DOES NOT MEAN THAT THE AUTHOR OR THE PUBLISHER ENDORSES THEINFORMATION THE ORGANIZATION OR WEBSITE MAY PROVIDE OR RECOMMENDATIONS IT

    MAY MAKE. FURTHER, READERS SHOULD BE AWARE THAT INTERNET WEBSITES LISTED INTHIS WORK MAY HAVE CHANGED OR DISAPPEARED BETWEEN WHEN THIS WORK WAS WRIT-TEN AND WHEN IT IS READ.

    For general information on our other products and services, please contact our BusinessDevelopment Department in the U.S. at 317-572-3205. For details on how to create a custom

    For Dummies book for your business or organization, contact [email protected] . Forinformation about licensing theFor Dummies brand for products or services, contactBrandedRights&[email protected].

    ISBN: 978-1-118-03204-6

    Manufactured in the United States of America

    10 9 8 7 6 5 4 3 2

    These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.

    http://www.wiley.com/http://www.wiley.com/go/permissionshttp://www.wiley.com/go/permissionshttp://www.wiley.com/go/permissionshttp://www.wiley.com/
  • 8/10/2019 Data DeDuplication for Dummies.pdf

    5/43

    Contents

    Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1

    How This Book Is Organized .................................................... 1

    Icons Used in This Book ............................................................ 2

    Chapter 1: Data Deduplication: Why Less Is More . . . . .3

    Duplicate Data: Empty Calories for Storageand Backup Systems .............................................................. 3

    Data Deduplication: Putting Your Data on a Diet .................. 4

    Why Data Deduplication Matters ............................................. 6

    Chapter 2: Data Deduplication in Detail . . . . . . . . . . . . . .7

    Making the Most of the Building Blocks of Data .................... 7

    Fixed-length blocks versus

    variable-length data segments ................................... 8

    Effect of change in deduplicated storage pools ......... 10Sharing a Common Data Deduplication Pool ....................... 12

    Data Deduplication Architectures ......................................... 13

    Chapter 3: The Business Case forData Deduplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15

    Deduplication to the Rescue: Replication

    and Disaster Recovery Protection ..................................... 16

    Reducing the Overall Cost of Storing Data ........................... 18

    Data Deduplication Also Works for Archiving ..................... 20

    Looking at the Quantum Data Deduplication Advantage ......20

    Chapter 4: Ten Frequently Asked DataDeduplication Questions (And Their Answers) . . . .23

    What Does the Term Data Deduplication Really Mean? .....23

    How Is Data Deduplication Applied to Replication? ............ 24

    What Applications Does Data Deduplication Support? ...... 24

    Is There Any Way to Tell How Much ImprovementData Deduplication Will Give Me? ...................................... 24

    What Are the Real Benefits of Data Deduplication? ............ 25

    What Is Variable-Block-Length Data Deduplication? ........... 25

    If the Data Is Divided into Blocks, Is It Safe? ......................... 26

    When Does Data Deduplication Occur during Backup? ...... 26

    Does Data Deduplication Support Tape? .............................. 27

    What Do Data Deduplication Solutions Cost? ...................... 28

    These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.

  • 8/10/2019 Data DeDuplication for Dummies.pdf

    6/43

    Data Deduplication For Dummies, Quantum 2nd Special Editioniv

    Appendix: Quantums Data DeduplicationProduct Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

    DXi4500 ........................................................................... 31

    DXi6500 Family ............................................................... 31

    DXi6700 ........................................................................... 31

    DXi8500 ........................................................................... 32

    iv

    These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.

  • 8/10/2019 Data DeDuplication for Dummies.pdf

    7/43

    Publishers AcknowledgmentsWere proud of this book and of the people who worked on it. For details on how to

    create a customFor Dummies book for your business or organization, [email protected]. For details on licensing theFor Dummies brand for products or services,contact BrandedRights&[email protected].

    Some of the people who helped bring this book to market include the following:

    Acquisitions, Editorial, and Media

    Development

    Project Editor: Linda Morris

    Editorial Managers:Jodi Jensen,Rev Mengle

    Acquisitions Editor: Kyle Looper

    Business Development Representative:Karen Hattan

    Custom Publishing Project Specialist:Michael Sullivan

    Composition Services

    Project Coordinator: Kristie Rees

    Layout and Graphics: Lavonne Roberts,Laura Westhuis

    Proofreaders: Jessica Kramer,Lindsay Littrell

    Publishing and Editorial for Technology Dummies

    Richard Swadley,Vice President and Executive Group Publisher

    Andy Cummings,Vice President and Publisher

    Mary Bednarek,Executive Director, Acquisitions

    Mary C. Corder,Editorial Director

    Publishing and Editorial for Consumer Dummies

    Diane Graves Steele, Vice President and Publisher, Consumer Dummies

    Ensley Eikenburg,Associate Publisher, Travel

    Composition Services

    Debbie Stailey,Director of Composition Services

    Business Development

    Lisa Coleman, Director, New Market and Brand Development

    These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.

  • 8/10/2019 Data DeDuplication for Dummies.pdf

    8/43

    These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.

  • 8/10/2019 Data DeDuplication for Dummies.pdf

    9/43

    Introduction

    Right now, duplicate data is stealing time and moneyfrom your organization. It could be a presentation sit-ting in hundreds of users network folders or a group e-mail

    sitting in thousands of inboxes. This redundant data makesboth storage and your backup process more costly, moretime-consuming, and less efficient. Data deduplication, usedon Quantums DXi-Series disk backup and replication appli-ances, dramatically reduces this redundant data and the costsassociated with it.

    Data Deduplication For Dummies,Quantum 2nd SpecialEdition, discusses the methods and rationale for reducing the

    amount of duplicate data maintained by your organization.This book is intended to provide you with the information youneed to understand how data deduplication can make a mean-ingful impact on your organizations data management.

    How This Book Is OrganizedThis book is arranged to guide you from the basics of data

    deduplication, through its details, and then to the businesscase for data deduplication.

    Chapter 1: Data Deduplication: Why Less Is More:Provides an overview of data deduplication, includingwhy its needed, the basics of how it works, and why itmatters to your organization.

    Chapter 2: Data Deduplication in Detail:Gives a relatively

    technical description of how data deduplication functions,how it can be optimized, its various architectures, andwhat happens when it gets applied to replication.

    Chapter 3: The Business Case for Data Deduplication:Provides an overview of the business costs of duplicatedata, how data deduplication can be effectively appliedto your current data management process, and how itcan aid in backup and recovery.

    These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.

  • 8/10/2019 Data DeDuplication for Dummies.pdf

    10/43

  • 8/10/2019 Data DeDuplication for Dummies.pdf

    11/43

    Chapter 1

    Data Deduplication:Why Less Is More

    In This Chapter Understanding where duplicate data comes from

    Identifying duplicate data

    Using data deduplication to reduce storage needs

    Figuring out why data deduplication is needed

    Maybe youve heard the clich Information is the life-blood of an organization. But many clichs have truthbehind them, and this is one such case. The organization thatbest manages its information is likely the most competitive.

    Of course, the data that makes up an organizations informa-tion must also be well-managed and protected. As the amount

    and types of data an organization must manage increase expo-nentially, this task becomes harder and harder. Complicatingmatters is the simple fact that so much data is redundant.

    To operate most effectively, every organization needs toreduce its duplicate data, increase the efficiency of its storageand backup systems, and reduce the overall cost of storage.Data deduplication is a powerful technology for doing just that.

    Duplicate Data: Empty Caloriesfor Storage and Backup Systems

    Allowing duplicate data in your storage and backup systemsis like eating whipped cream straight out of the bowl: You get

    These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.

  • 8/10/2019 Data DeDuplication for Dummies.pdf

    12/43

    Data Deduplication For Dummies, Quantum 2nd Special Edition4

    plenty of calories, but no nutrition. Take it to an extreme, andyou end up overweight and undernourished. In the IT world,

    that means buying lots more storage than you really need.

    The tricky part is that its not really the IT team that controlshow much duplicate data you have. All of your users andsystems generate duplicate data, and the larger your organiza-tion and the more careful you are about backup, the biggerthe impact is.

    For example, say that a sales manager sends out a 10MB pre-

    sentation via e-mail to 500 salespeople and each person storesthe file. The presentation now takes up 5GB of your storagespace. Okay, you can live with that, but look at the impact onyour backup!

    Because yours is a prudent organization, each users networkshare is backed up nightly. So day after day, week after week,you are adding 5GB of data each day to your backup, and mostof the data in those files consists of the same blocks repeated

    over and over and over again. Multiply this by untold numbersof other sources of duplicate data, and the impact on your stor-age and backup systems becomes clear. Your storage needsskyrocket, and your backup costs explode.

    Data Deduplication: Putting

    Your Data on a DietIf you want to lose weight, you either reduce your calories orincrease your exercise. The same is sort of true for your data,except you cant make your storage and backup systems runlaps to slim down.

    Instead, you need a way to identify duplicate data and theneliminate it.Data deduplicationtechnology provides just such

    a solution. Systems like Quantums DXi products that useblock-based deduplication start by segmenting a dataset intovariable-length blocks and then check for duplicates. Whenthey find a block theyve seen before, instead of storing itagain, they store a pointer to the original. Reading the file issimple the sequence of pointers makes sure all the blocksare accessed in the right order.

    These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.

  • 8/10/2019 Data DeDuplication for Dummies.pdf

    13/43

    Chapter 1: Data Deduplication: Why Less Is More 5

    Compared to other storage reduction methods that look forrepeated whole files (single-instance storage is an example),

    data deduplication provides much more granularity. Thatmeans that in most cases, it dramatically reduces the amountof storage space needed.

    As an example, consider the sales deck that everybody saved.Imagine that everybody put their name on the title page. Asingle-instance system would identify all the files as uniqueand save all of them. A system with data deduplication, how-ever, can tell the difference between unique and duplicate

    blocks inside files and between files, and its designed to saveonly one copy of the redundant data segments. That meansthat you use much less storage.

    Data deduplication isnt a stand-alone technology it canwork with single-instance storage and conventional compres-sion. That means data deduplication can be integrated intoexisting storage and backup systems to decrease storagerequirements without making drastic changes to an

    organizations infrastructure.

    A brief history of data reductionOne of the earliest approaches todata reduction was data compres-sion, which searches for repeated

    strings within a single file. Differenttypes of compression technologiesexist for different types of files, butall share a common limitation: Eachreduces duplicate data only withinspecific parts of individual files.

    Next came single-instance storage,which reduces storage needs byrecognizing when files are repeated.Single-instance storage is used inbackup systems, for example, wherea full backup is made first, and then

    incremental backups are made ofonly changed and new files. Theeffectiveness of single-instance

    storage is limited because it savesmultiple copies of files that may haveonly minor differences.

    Data deduplication is the newesttechnique for reducing data.Because it recognizes differences ata variable-length block basis withinfiles and betweenfiles, data dedu-plication is the most efficient datareduction technique yet developedand allows for the highest savings instorage costs.

    These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.

  • 8/10/2019 Data DeDuplication for Dummies.pdf

    14/43

    Data Deduplication For Dummies, Quantum 2nd Special Edition6

    Data deduplication utilizes proven technology. Most data isalready stored in non-contiguous blocks, even on a single-disk

    system, with pointers to where each files blocks reside. InWindows systems, theFile Allocation Table (FAT) maps thepointers. Each time a file is accessed, the FAT is referenced toread blocks in the right sequence. Data deduplication refer-ences identical blocks of data with multiple pointers, but ituses the same basic principles for reading multi-block filesthat you are using today.

    Why Data Deduplication MattersIncreasing the data you can put on a given disk makes sensefor an IT organization for lots of reasons. The obvious one isthat it reduces direct costs. Although disk costs have droppeddramatically over the last decade, the increase in the amountof data being stored has more than eaten up the savings.

    Just as important, however, is that data deduplication also re-duces network bandwidth needs for transmitting data whenyou store less data, you have to move less data, too. That opensup new protection and disaster recovery capabilities replica-tion of backup data, for example which make management ofdata much easier.

    Finally, there are major impacts on indirect costs theamount of space required for storage, cooling requirements,and power use. Management time is also reduced oftendramatically. Quantum DXi customers in a recent surveyaveraged a 63 percent reduction in the amount of timethey had to spend managing their backups.

    These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.

  • 8/10/2019 Data DeDuplication for Dummies.pdf

    15/43

  • 8/10/2019 Data DeDuplication for Dummies.pdf

    16/43

  • 8/10/2019 Data DeDuplication for Dummies.pdf

    17/43

    Chapter 2: Data Deduplication in Detail 9

    dividing a data stream into fixed-length blocks, then chang-ing any single block means that all the downstream blocks

    will look different the next time the data set is transmitted.Bottom line, you wont find very many common segments.

    So instead of fixed blocks, Quantums deduplication technol-ogy divides the data stream into variable-length data seg-ments using a system that can find the same block boundariesin different locations and contexts. This block-creation pro-cess lets the boundaries float within the data stream so thatchanges in one part of the dataset have little or no impact on

    the blocks in other parts of the dataset. Duplicate data seg-ments can then be found globally at different locations insidea file, inside different files, inside files created by differentapplications, and inside files created at different times.Figure 2-1 shows fixed-block data deduplication.

    A B C D

    E F G H

    Figure 2-1:Fixed-length block data in data deduplication.

    The upper line shows the original blocks the lowershows the blocks after making a single change to Block A(an insertion). The shaded sequence is identical in bothlines, but all of the blocks have changed and no duplicationis detected there are eight unique blocks.

    Data deduplication utilizes variable-length blocks. In Figure 2-2,Block A changes when the new data is added (it is now E), butnone of the other blocks are affected. Blocks B, C, and D are allidentical to the same blocks in the first line. In all, we have onlyfive unique blocks.

    These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.

  • 8/10/2019 Data DeDuplication for Dummies.pdf

    18/43

    Data Deduplication For Dummies, Quantum 2nd Special Edition10

    E B C D

    A B C D

    Figure 2-2:Variable-length block data in data deduplication.

    Effect of change in deduplicatedstorage poolsWhen a dataset is processed for the first time by a data de-duplication system, the number of duplicate data segmentsvaries depending on the nature of the data (both file typeand content). The gain can range from negligible to 50% ormore in storage efficiency.

    But when multiple similar datasets like a sequence ofbackup images from the same volume are written to acommon deduplication pool, the benefit is very significantbecause each new write only increases the size of the totalpool by the number of new data segments. In typical businessdata sets, its common to see block-level differences betweentwo backups of only 1% or 2%, although higher change ratesare also frequently seen.

    The number of new data segments in each new backupdepends a little on the data type, but mostly on the rate ofchange between backups. And total storage requirement alsodepends to a very great extent on your retention policies the number of backup jobs and the length of time they areheld on disk. The relationship between the amount of datasent to the deduplication system and the disk capacity actu-ally used to store it is referred to as the deduplicationratio.

    These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.

  • 8/10/2019 Data DeDuplication for Dummies.pdf

    19/43

    Chapter 2: Data Deduplication in Detail 11

    Figure 2-3 shows the formula used to derive the data dedupli-cation ratio, and Figure 2-4 shows the ratio for four different

    backup datasets with different change rates (compressionalso figures in, so the figure also shows different compressioneffects). These charts assume full backups, but deduplicationalso works when incremental backups are included. As it turnsout, though, the total amount of data stored in the deduplica-tion appliance may well be the same for either method becausethe storage pool only stores new blocks under either system.The deduplication ratio differs, though, because the amount ofdata sent to the system is much greater in a daily full model.

    So the storage advantage is greater for full backups even if theamount of data stored is the same.

    Data deduplication ratio =Total data before reduction

    Total data after reduction

    Figure 2-3:Deduplication ratio formula.

    It makes sense that data deduplication has the most powerfuleffect when it is used for backup data sets with low or modestchange rates, but even for data sets with high rates of change,the advantage can be significant.

    To help you select the right deduplication appliance, Quantumuses a sizing calculator that models the growth of backup data-sets based on the amount of data to be protected, the backupmethodology, type of data, overall compressibility, rates of

    growth and change, and the length of time the data is to beretained. The sizing calculator helps you understand wheredata deduplication has the most advantage and where moreconventional disk or tape backup systems provide moreappropriate functionality.

    These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.

  • 8/10/2019 Data DeDuplication for Dummies.pdf

    20/43

  • 8/10/2019 Data DeDuplication for Dummies.pdf

    21/43

  • 8/10/2019 Data DeDuplication for Dummies.pdf

    22/43

  • 8/10/2019 Data DeDuplication for Dummies.pdf

    23/43

    Chapter 3

    The Business Case forData Deduplication

    In This Chapter Looking at the business value of deduplication

    Finding out why applying the technology to replication anddisaster recovery is key

    Identifying the cost of storing duplicate data

    Looking at the Quantum data deduplication advantage

    As with all IT investments, data deduplication must makebusiness sense to merit adoption. At one level, the valueis pretty easy to establish. Adding disk to your backup strategycan provide faster backup and restore performance, as well asgive you RAID levels of fault tolerance. But with conventionalstorage technology, the amount of disk people need for backup

    just costs too much. Data deduplication solves that problemfor many users by letting them reduce the amount of disk theyneed to hold their backup data by 90 percent or more, whichtranslates into immediate savings.

    Conventional disk backup has a second limitation that someusers think is even more important disaster recovery (DR)protection. Can data deduplication help there? Absolutely!The key is using the technology to power remote replication,

    and the outcome provides another compelling set ofbusiness advantages.

    These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.

  • 8/10/2019 Data DeDuplication for Dummies.pdf

    24/43

  • 8/10/2019 Data DeDuplication for Dummies.pdf

    25/43

    Chapter 3: The Business Case for Data Deduplication 17

    What about tape? Do you still need it? Disk-based deduplica-tion and replication can reduce the amount of tape you use,

    but most IT departments combine the technologies, using tapefor longer-term retention. This approach makes sense for mostusers. If you want to keep data for six months or three years orseven years, tape provides the right economics and portability,and the new encryption capabilities that tape drives offer nowmake securing the data that goes off site on tape easy.

    The best solution providers will help you get the right balance,and at least one of them Quantum lets you manage the

    disk and tape systems from a single management console, and itsupports all your backup systems with the same service team.

    The asynchronous replication method employed by Quantumin its DXi-Series disk backup and replication solutions can giveusers extra bandwidth leverage. Before any blocks are replicatedto a target, the source system sends a list of blocks it wants toreplicate. The target checks this list of candidate blocks againstthe blocks it already has, and then it tells the source what it

    needs to send. So if the same blocks exist in two different offices,they have to be replicated to the target only one time.

    Figure 3-1 shows how the deduplication process works onreplication over a WAN.

    C e

    Target

    WAN

    Step 2:Only the missing datablocks are replicatedand moved over the WAN.

    Step 1:Source sends a list of elements toreplicate to the target. Targetreturns list of blocks not already

    stored there.

    A B C D A B D

    C

    A,B,C,D?

    Sourceour e

    Source

    Figure 3-1:Verifying data segments prior to transmission.

    Because many organizations use public data exchanges tosupply WAN services between distributed sites, and becausedata transmitted between sites can take multiple paths fromsource to target, deduplication appliances should offer encryp-tion capabilities to ensure the security of data transmissions.

    These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.

  • 8/10/2019 Data DeDuplication for Dummies.pdf

    26/43

    Data Deduplication For Dummies, Quantum 2nd Special Edition18

    In the case of DXi-Series appliances, all replicated data bothmetadata and actual blocks of data can be encrypted at the

    source using SHA-AES 256-bit encryption and decrypted at thetarget appliance.

    Reducing the OverallCost of Storing Data

    Storing redundant backup data brings with it a number ofcosts, from hard costs such as storage hardware to opera-tional costs such as the labor to manage removable backupmedia and off-site storage and retrieval fees. Data deduplica-tion offers a number of opportunities for organizations toimprove the effectiveness of their backup and to reduceoverall data protection costs.

    These include the opportunity to reduce hardware acquisi-

    tion costs, but even more important for many IT organizationsis the combination of all the costs that go into backup. Theyinclude ongoing service costs, costs of removable media,the time spent managing backup at different locations, andthe potential lost opportunity or liability costs if critical databecomes unavailable.

    The situation is also made more complex by the fact that in thebackup world, there are several kinds of technology and different

    situations often call for different combinations of them. If data ischanging rapidly, for example, or only needs to be retained for afew days, the best option may be conventional disk backup. If itneeds to be retained for longer periods six months, a year, ormore traditional tape-based systems may make more sense.For many organizations, the need is likely to be different fordifferent kinds of data.

    The savings from combining disk-based backup, deduplica-

    tion, replication, and tape in an optimal way can providevery significant savings when users look at their total data-protection costs. A white paper published in November 2011by industry group IDC titled Demonstrating the BusinessValue of Deduplication for Data Protection, and sponsored by

    These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.

  • 8/10/2019 Data DeDuplication for Dummies.pdf

    27/43

    Chapter 3: The Business Case for Data Deduplication 19

    Quantum studied organizations that had deployed QuantumDXi deduplication systems. The findings? The study found

    that over three years the companies saved $4.75 for $1 dollarinvested. The systems paid for themselves in an average timeof 7 months. Where were the savings? In reduced media usage,lower power and cooling, savings on license and service costs,and in increased productivity. The key was data deduplication,replication, and combining it with traditional tape in an optimalway. (See Figure 3-2.)

    Average Annual Benets (per 100 users)

    Storage Environment Cost Savings

    IT Staff Productivity Optimization

    End User Productivity Enhancement

    ($/Year/100Users)

    Source:IDCWhitePaper

    50,000

    45,000

    40,000

    35,000

    30,000

    25,000

    20,000

    15,000

    10,000

    5,000

    0

    $47,316

    $22,670

    $15,515

    $9,131

    Figure 3-2:A recent IDC study found significant savings from combining

    disk-based backup, deduplication, replication, and tape.

    The key to finding the best answer is looking clearly at all thealternatives and finding the best way to combine them. A sup-plier like Quantum that can provide and support all the differ-ent options is likely to give users a wider range of solutionsthan a company that offers only one kind of technology, and

    such suppliers have teams of people that can help IT depart-ments look at the alternatives in an objective way.

    You can get an idea of the kinds of savings that deduplica-tion can provide for your organization by using an on-line ROIestimating tool developed by IDC, available at www.quantum.com.

    These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.

  • 8/10/2019 Data DeDuplication for Dummies.pdf

    28/43

  • 8/10/2019 Data DeDuplication for Dummies.pdf

    29/43

    Chapter 3: The Business Case for Data Deduplication 21

    Quantum deduplication products cover a broad range of sizes,from compact units for small businesses and remote offices, to

    midrange appliances, to enterprise systems that can hold6.4 petabytes of backup data. All systems include deduplicationand replication functionality in their base price, and the largersystems include software for creating tapes directly and soft-ware that provides the option of hybrid-mode operation.

    The DXi-Series works with all leading backup software, includ-ing Symantecs OpenStorage API, to provide end-to-end sup-port that spans multiple sites and integrates with tape backup

    systems to make integrating deduplication technology intoexisting backup architecture easy for users. DXi-Series appli-ances are part of a comprehensive set of backup solutionsfrom Quantum, the leading global specialist in backup, recov-ery, and archive. Whether the solution is disk with deduplica-tion and replication, conventional disk, tape, or a combinationof technologies, Quantum offers advanced technology, provenproducts, centralized management, and expert professionalservices offerings for all your backup and archive systems.

    The results that Quantum DXi customers report show the kindof direct business benefits that adding deduplication technol-ogy can have on IT departments. The same IDC report men-tioned earlier in this chapter found that:

    Backups on average were more than twice as fast asbefore (52 percent reduction in time required).

    Failed backup jobs were reduced by 91 percent.

    Time to restore files was reduced by 95 percent

    Overall sys admin time for backup was reduced by 61percent.

    And the productivity gains were not limited to IT person-nel. The companies in the study, on average, realizeda gain of nearly 30 hours per year for each end userbecause backups and restores were faster, and negative

    impact on server operations from backup were reduced.

    Overall, systems paid for themselves in an average of 7 monthsthrough a combination of increased productivity and reduceddirect costs, including savings in the purchase, transport, stor-age and recall of removable media.

    These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.

  • 8/10/2019 Data DeDuplication for Dummies.pdf

    30/43

    Data Deduplication For Dummies, Quantum 2nd Special Edition22

    These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.

  • 8/10/2019 Data DeDuplication for Dummies.pdf

    31/43

  • 8/10/2019 Data DeDuplication for Dummies.pdf

    32/43

    Data Deduplication For Dummies, Quantum 2nd Special Edition24

    instead of storing the block again. Because the pointer takesup less space than the block, you save space. In backup,

    where the same blocks show up again and again, userstypically reduce disk needs by 90 percent or more.

    How Is Data DeduplicationApplied to Replication?

    Replication is the process of sending duplicate data from asource to a target. Typically, a relatively high performancenetwork is required to replicate large amounts of backup data.But with deduplication, the source system the one sendingdata looks for duplicate blocks in the replication stream.Blocks already transmitted to the target system dont needto be transmitted again. The system simply sends a pointer,which is much smaller than the block of data and requiresmuch less bandwidth.

    What Applications Does DataDeduplication Support?

    When used for backup, data deduplication supports allapplications and all qualified backup packages. Certain filetypes some rich media files, for example dont see much

    advantage the first time they are sent through deduplicationbecause the applications that wrote the files already elimi-nated redundancy. But if those files are backed up multipletimes or backed up after small changes are made, deduplica-tion can create very powerful capacity advantages.

    Is There Any Way to Tell HowMuch Improvement DataDeduplication Will Give Me?

    Four primary variables affect how much improvement you willrealize from data deduplication:

    These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.

  • 8/10/2019 Data DeDuplication for Dummies.pdf

    33/43

    Chapter 4: Ten Frequently Asked Data Deduplication Questions 25

    How much your data changes (that is, how many newblocks get introduced)

    How well your data compresses using conventionalcompression techniques

    How your backup methodology is designed (that is,full versus incremental or differential)

    How long you plan to retain the backup data

    Quantum offers sizing calculators to estimate the effect thatdata deduplication will have on your business. Pre-salessystems engineers can walk you through the process andshow you what kind of benefit you will see.

    What Are the Real Benefitsof Data Deduplication?

    There are two main benefits of data deduplication. First, datadeduplication technology lets you keep more backup data ondisk than with any conventional disk backup system, whichmeans that you can restore more data faster. Second, it makesit practical to use standard WANs and replication for disasterrecovery (DR) protection, which means that users can pro-vide DR protection while reducing the amount of removablemedia (thats tape) handling that they do.

    What Is Variable-Block-LengthData Deduplication?

    Its easiest to think of the alternative to variable-length, whichis fixed-length. If you divided a stream of data into fixed-lengthsegments, every time something changed at one point, all

    the blocks downstream would also change. The system ofvariable-length blocks that Quantum uses allows some of thesegments to stretch or shrink, while leaving downstream blocksunchanged. This increases the ability of the system to findduplicate data segments, so it saves significantly more space.

    These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.

  • 8/10/2019 Data DeDuplication for Dummies.pdf

    34/43

    Data Deduplication For Dummies, Quantum 2nd Special Edition26

    If the Data Is Divided intoBlocks, Is It Safe?

    The technology for using pointers to reference a sequence ofdata segments has been standard in the industry for decades:You use it every day, and it is safe. Whenever a large file iswritten to disk, it is stored in blocks on different disk sectorsin an order determined by space availability. When you reada file, you are really reading pointers in the files metadata

    that reference the various sectors in the right order. Block-based data deduplication applies a similar kind of technology,but it allows a single block to be referenced by multiple setsof metadata.

    When Does Data Deduplication

    Occur during Backup?There are really three choices.You can send all your backup data to a backup target andperform deduplication there (usually called target-baseddeduplication), you can perform the deduplication on eachprotected host, or you can use a central media server tocarry out the deduplication. All three systems are available

    and have advantages.

    If deduplication is carried out in the backup application onthe media server, you dont have to buy a special-purposetarget deduplication device, but support is limited to oneapplication and all the overhead of the deduplication is addedto the servers other duties and deduplication systemsthat provide good reduction require significant processing.So users deploying server-based deduplication report slower

    backup, limited scalability, and requirements to upgradetheir disk storage and buy more, heavier-duty servers.

    If you use a target deduplication appliance, you send all thedata to the device and deduplicate it there. You have to buy

    These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.

  • 8/10/2019 Data DeDuplication for Dummies.pdf

    35/43

  • 8/10/2019 Data DeDuplication for Dummies.pdf

    36/43

    Data Deduplication For Dummies, Quantum 2nd Special Edition28

    What Do Data DeduplicationSolutions Cost?

    Costs can vary a lot, but seeing list prices in the range of 30to 75 cents per GB of stored, deduplicated data is common. Agood rule-of-thumb rate for deduplication is 20:1 meaningthat you can store 20 times more data than conventional disk.Using that figure, systems that could retain 44TB of backupdata would have a list price of $12,500 or 28 cents a GB. So

    even at the manufacturers suggested list and discounts arenormally available deduplication appliance costs are a lotlower than if you protected the same data using conventionaldisk. Even more important, a recent IDC study (a summary ofwhich is available from www.quantum.com) concluded thatcompanies saved $4.75 for every $1 invested over a three-year deployment, and that the deduplication systems paid forthemselves in savings in an average of 7 months.

    These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.

  • 8/10/2019 Data DeDuplication for Dummies.pdf

    37/43

    Appendix

    Quantums DataDeduplication Product Line

    In This Appendix Reviewing the Quantum DXi-Series disk backup and remote

    replication appliances

    Identifying the features and benefits of the DXi-Series

    Quantum Corp. is the leading global storage companyspecializing in backup, recovery, and archive. Combining

    focused expertise, customer-driven innovation, and platformindependence, Quantum provides a comprehensive range ofdisk, tape, and software solutions supported by a world-classsales and service organization. As a long-standing and trustedpartner, the company works closely with a broad network ofresellers, original equipment manufacturers (OEMs), and othersuppliers to meet customers evolving data protection needs.

    Quantums DXi-Series disk backup appliances leverage pat-ented data deduplication technology to reduce the diskneeded for backup by 90 percent or more, make remotereplication a practical and cost-effective DR technique, andreduce network bandwidth needs by distributing data reduc-tion between servers and appliances. Figure A-1 shows howDXi-Series replication uses existing WANs for DR protection,

    linking backup data across sites and reducing or eliminatingmedia handling.

    These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.

  • 8/10/2019 Data DeDuplication for Dummies.pdf

    38/43

    Data Deduplication For Dummies, Quantum 2nd Special Edition30

    DXi8500located atcentraldata center

    Quantums Replication TechnologyUsers replicate data over existing WANs to provide automated DRprotection and centralized media management. Quantum replicationfeatures cross-site deduplication prior to data transmission foradditional bandwidth savings.

    Remote office ADXi4000

    DXi6700

    Remote office B

    Remote office C

    Scalar i500tape library

    DXi4000

    Figure A-1:DXi-Series replication.

    The DXi Series spans the widest range of backup capacitypoints in the industry. Some of the features and benefits ofQuantums DXi Series include:

    Patented data deduplication technology that reducesdisk requirements by 90 percent or more

    A broad solution set of turnkey appliances for small andmedium business, distributed and midrange sites, andscalable systems for the enterprise

    High backup performance for each class of appliances,providing optimal protection, even when there are tightbackup windows

    Software (DXi Accent) that distributes deduplicationbetween backup servers and appliances to increasebackup speeds in bandwidth-constrained environmentsand enable remote backup

    These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.

  • 8/10/2019 Data DeDuplication for Dummies.pdf

    39/43

    Appendix: Quantums Data Deduplication Product Line 31

    Software licenses that are included in the base price tomaximize value, streamline deployment, and give users

    leading price-performance across the entire product line

    Quantums data deduplication also dramatically reduces thebandwidth needed to replicate backup data between sites for automated disaster recovery protection.

    All models share a common software layer, including dedu-plication and remote replication, allowing IT departments toconnect all their sites in a comprehensive data protectionstrategy that boosts backup performance, reduces or elimi-nates media handling, and centralizes disaster recovery oper-ations. Support includes Symantec OpenStorage API (OST) forboth disk and tape on DXi4000, DXi6700 and DXi8500 models.

    The following sections offer more details about the individualDXi systems.

    DXi4000 SeriesThe DXi4000 backup appliances provide an affordable, easyalternative with the industrys first capacity-on-demand dedu-plication. With up to twice the performance of competitorsand as little as half the cost, DXi4000 deduplication applianceskeep backup and restore performance high while deliveringindustry-leading value for fast return on investment. Designedfor small to medium businesses or branch offices, DXi4000appliances support all leading backup software, includingthose designed specifically for virtual servers.

    DXi6700 SeriesThe DXi6700 Series provides deduplication without compro-mise, combining the broadest scalability and highest perfor-mance with leading value and unique extensibility supportingthe broadest range of IT environments. The DXi6700 models

    provide maximum flexibility and value for maximum invest-ment protection in evolving backup environments, provid-ing simultaneous NAS, VTL and OST interfaces. Finally, theDXi6700 Series has integrated support for vmPRO software,providing faster, easier protection of virtual servers and opti-mized deduplication rates.

    These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.

  • 8/10/2019 Data DeDuplication for Dummies.pdf

    40/43

    Data Deduplication For Dummies, Quantum 2nd Special Edition32

    DXi8500 SeriesThe Enterprise-class DXi8500 appliances support high perfor-mance backup and anchor a multi-site, multi-tier data protec-tion strategy. Replication, VTL, OST, and direct tape creationare included in the DXi8500s base price, and it offers full sup-port for vmPRO software for faster, easier protection of vir-tual servers and optimized deduplication rates. The DXi8500sdirect path-to-tape feature gives users a tool for integratingthe creation of removable media into the disk backup processunder full control of the backup application while reducingloads on backup servers. The DXi8500 provides faster back-ups, streamlined restores, automated DR protection, and inte-grated tape creation to simplify backup and reduce costs.

    These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.

  • 8/10/2019 Data DeDuplication for Dummies.pdf

    41/43

    Notes

    These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.

  • 8/10/2019 Data DeDuplication for Dummies.pdf

    42/43

    Notes

    These materials are the copyright of John Wiley & Sons, Inc. and anydissemination, distribution, or unauthorized use is strictly prohibited.

  • 8/10/2019 Data DeDuplication for Dummies.pdf

    43/43