Lun Provision

47
ABSTRACT In the face of exponential data growth, efficient management of data is crucial. NetApp provides a set of technologies to do more with less. These technologies allow for thin- provisioned storage—the ability to consolidate much more data on NetApp ® storage controllers than would fit in the physically attached disks. This document explains how to achieve best-in-class storage use and how to manage thin-provisioned storage to enable storage efficiency in daily life while meeting service level agreements. Technical Report Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use Dr. Adolf Hohl, Georg Mey, NetApp, with support from the NetApp Field Centers for Innovation October 2010 | RA-0007

description

Netapp Lun Provision

Transcript of Lun Provision

Page 1: Lun Provision

ABSTRACT In the face of exponential data growth efficient management of data is crucial NetApp provides a set of technologies to do more with less These technologies allow for thin-provisioned storagemdashthe ability to consolidate much more data on NetAppreg storage controllers than would fit in the physically attached disks This document explains how to achieve best-in-class storage use and how to manage thin-provisioned storage to enable storage efficiency in daily life while meeting service level agreements

Technical Report

Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use Dr Adolf Hohl Georg Mey NetApp with support from the NetApp Field Centers for Innovation

October 2010 | RA-0007

2 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

TABLE OF CONTENTS

1 EXECUTIVE SUMMARY 4

2 INTRODUCTION 5

21 TERMINOLOGY 5

22 GOAL OF THIS DOCUMENT 6

23 AUDIENCE 8

24 SCENARIO 9

25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY 11

3 PROVISIONING 12

31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING 12

32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS 22

33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION 27

4 OPERATION 30

41 PHASES AND TRANSITIONS 31

42 MONITORING 31

43 NOTIFICATION 35

44 MITIGATE STORAGE USE 37

5 REAL-LIFE SETTINGS 39

51 SAMPLE SETTING 1 REAL-LIFE SETTING 39

52 SAMPLE SETTING 2 SETTLEDNOMAD 41

6 STORAGE EFFICIENCY COOKBOOK 43

7 REFERENCES 46

8 ACKNOWLEDGMENTS 47

LIST OF TABLES

Table 1) NetApp technologies for storage efficiency and flexibility 11

Table 2) Full fat provisioning 13

Table 3) Zero fat provisioning 14

Table 4) Full fat provisioning 16

Table 5) Low fat provisioning 16

Table 6) Zero fat provisioning 17

Table 7) Comparison of provisioning methods 18

Table 8) Mitigation alternatives to control use within aggregates 38

Table 9) Mitigation activities for resource tightness within volumes 38

Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative 41

3 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

LIST OF FIGURES

Figure 1) Terminology in context of the storage objects of volumes and aggregates 6

Figure 2) Storage consolidation and growing utilization using thin provisioning 7

Figure 3) Controlled and slowed data growth within the operational sweet spot corridor in the aggregate 7

Figure 4) Mitigate to prevent uncontrolled utilization 8

Figure 5) Sample service levels ordered by service disruption and recovery time 9

Figure 6) Questions regarding storage efficiency from an operational point of view 10

Figure 7) Provisioning model for NAS storage from scratch Technically only two out of four combinations are possible 13

Figure 8) Provisioning model for SAN storage from scratch 15

Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete 20

Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat 21

Figure 11) Fulllowzero fat provisioning policies for datasets and storage services 21

Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes 24

Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically 26

Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate 27

Figure 15) Alignment by technical impact (sorted by negative impact in descending order) 28

Figure 16) Alignment by business impact (sorted by negative impact in descending order) 28

Figure 17) Operations Manager screen to configure thresholds on operational metrics 32

Figure 18) Trending of data growth and days-to-full prediction in Operations Manager 33

Figure 19) Storage efficiency dashboard in Operations Manager 34

Figure 20) Configuring an alarm based on the threshold aggregate almost full 36

Figure 21) Storage to enable organic data growth between planned downtime windows 39

Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space 40

Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months 41

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used 42

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe 43

4 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

1 EXECUTIVE SUMMARY This document provides consolidated best practices to achieve and manage best-in-class storage use We introduce intervals and metrics that trigger changes in behavior in order to operate NetApp storage in a corridor of high utilization as long as possible Starting with provisioning models focused on high consolidation and operational agility we describe the operational phases and its transitions A list of mitigation alternatives describes the available alternatives to control use in the face of data growth

Finally this document presents real-life settings where high data consolidation is achieved by using NetApp storage efficiency technologies

5 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

2 INTRODUCTION Exponential data growth generates a serious challenge for IT managers Gartner predicts that within the period from 2008 to 2013 enterprises will purchase and install 20 times more terabytes of storage than they did in 2008 (wwwgartnercomtechnologymedia-productsnewslettersnetappissue24gartner3html) Until recently continuous improvements in cost for performance and storage space have made it easy and affordable to solve storage concerns by adding more disks to existing storage systems However IT executives are discovering that there are limits to that solution floor space weight loads rack space network drops power connections cooling infrastructure and even power itself are finite resources Hitting any one of these limits significantly jeopardizes the ability of IT to meet business demands

NetApprsquos solution to rapid resource consumption is to reduce storage controllers and disks from the resource equation by using storage more efficiently Key benefits of this strategy are

bull Less management involvement bull Reduced complexity support and service costs bull Improved performance and network efficiency

NetApp storage efficiency technologies are key to achieving data consolidation and managing future data growth they allow for storing and managing several times more data on NetApp storage controllers than would fit on their physical attached disks and allow the deferral of IT investments to the future

In this document we describe techniques and guidelines you can use to find the operational sweet spot for NetApp storage efficiency technologies in your environment By adhering to the best practices outlined you can increase storage consolidation and agility as well as decrease operational risk

The document is organized as follows

bull Chapter 3 describes storage provisioning bull Chapter 4 describes the monitoring process and supporting tools for daily operation bull Chapter 5 describes concrete operational setups used in daily life bull Chapter 6 concludes with a step-by-step cookbook to provision and manage storage efficiently and to

adapt individual thresholds

21 TERMINOLOGY

We use the following terminology to describe resource use on the level of exposing storage to applications and on the level of physical resource allocation within the aggregates in the storage controllers Also refer to the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further explanation of this terminology

bull Logical storage refers to storage that is visible at the application layer Logical storage does not necessarily require the allocation of usable capacity

bull Usable capacity refers to storage that is usable for the applications provided by NetApp storage controllers

bull Used capacity is a value that represents the amount of physical capacity that holds application or user data In Operations Manager1

bull Storage utilization refers to the ratio of usable capacity to used capacity without accounting for efficiency returns

terminology this is represented by capacity used

1 NetApp Operations Manager is a central console that delivers comprehensive monitoring and management for NetApp storage with alerts reports performance and configuration tools

6 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

bull Commitment rate is an Operations Manager term that refers to the percentage of aggregate space committed to volumes

bull Deduplication rate is an Operations Manager term that measures the efficiency of the deduplication functionality This rate is measured for the volume and the aggregate level in percentage metric

For the aggregate we define different operational windows characterized by an interval of storage utilization We define an interval as the operational sweet spot corridor (green) where the aggregate should be operated for optimal utilization and service availability We define a tolerance interval (yellow) where actions are taken to get back into the operational sweet spot window We define an interval as a no-go area (red) where we do not intend to operate the aggregate This area might act as a last buffer of time or can be considered an area where operational staff has less experience

Figure 1 explains the terminology in the context of storage objects on a NetApp storage controller The aggregate is a physically limited storage object Aggregates are treated as fairly static containers and thus need proper size management

Figure 1) Terminology in context of the storage objects of volumes and aggregates

Committed Logical Storage

Usable Capacity of Aggregate

Data

Data Growth

Volumes with LUNSNAS

Operational Sweet Spot Corridor

Used Capacity

In practice commitment rates far above 100 are common in customer environments This document describes how to manage this

22 GOAL OF THIS DOCUMENT

The goal of this document is to achieve best-in-class storage efficiency and costs by consolidating the highest possible amount of data of applications while meeting the required service-level agreements The idea is to enable thin provisioning while controlling the use of physical resources in the NetApp shared storage infrastructure within a desired corridor NetApp storage efficiency technologies can save a significant amount of the IT budget On the other side running at an uncontrolled use level can reduce flexibility and cause headaches in managing data growth and service level fulfillment

The difference in managing thin-provisioned storage compared to traditional storage is that due to dense consolidation of application data accumulated application data growth rates might vary in a broader corridor than they would traditionally

7 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

To control the level of physical resources we outline methods for increasing storage utilization by provisioning storage with NetApp storage efficiency technologies We also outline how to react to organic data growth and how to level the storage use within a corridor of high efficiency by using NetApp technologies for data center flexibility

To summarize this document introduces three phases to manage the storage on NetApp storage controllers provisioning organic growth and mitigation of storage tightness These phases play a vital role for aggregates which are the coarsest storage abstraction of a NetApp storage controller

bull Provisioning phase In this phase storage is provisioned by the NetApp shared storage infrastructure which increases the utilization of aggregates The goal is to operate the aggregates with a high level of data consolidation in an efficient utilization corridor Figure 2 visualizes this corridor

bull Organic growth phase In this phase no further storage is provisioned to slow down growth of aggregate utilization The goal is to operate in a corridor of high utilization but safely reach the next planned downtime or administration window of the served applications Thus the organic growth phase is sized depending on the growth rate of capacity used and the length of the time frame to the planned downtime windows Figure 3 visualizes slowed growth

bull Mitigation of storage tightness phase This phase prevents an uncontrolled level of utilization and provides mitigation activities to lower this level Several mitigation alternatives are presented to mitigate storage tightness and to shift back the aggregate utilization in the desired operational corridor Figure 4 visualizes this mitigation

Figure 2) Storage consolidation and growing utilization using thin provisioning

Data Data Growth

Operational Sweet Spot Corridor

Aggregate Capacity

Figure 3) Controlled and slowed data growth within the operational sweet spot corridor in the aggregate

Data Growth

Aggregate Capacity

Data

8 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 4) Mitigate to prevent uncontrolled utilization

Aggregate Capacity

Data

Mitigate to prevent uncontrolled utilization

This document addresses best practice and tools to manage the NetApp storage infrastructure and to support the decision making during transition between phases

23 AUDIENCE

This document addresses two audiences

bull Decision makers It provides decision makers with an understanding of how to align storage efficiency best practices and processes within their existing operations organization

bull Operational teams It allows operational teams to understand monitoring and management of the storage infrastructure while mastering data growth It allows the operational teams to implement a basic setting and to position their usage goals We refer to existing NetApp best practices to increase the level of data consolidation and to achieve overcommitment for major applications

9 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

24 SCENARIO

As a scenario we consider a service provider who delivers a set of IT services This service provider might serve internal or external customers at different service levels The service levels provided are characterized by unplanned downtime as exemplified by Figure 5 This characterization is useful for aligning service data with physical resources In our example the highest level of service availability is delivered for Platinum services It is further assumed that provided services have different lifetimes and dates of creation

The service providerrsquos major goal is to deliver the services within the specified service level and to achieve maximum data consolidation through NetApp storage efficiency technologies This directly translates into cost savings related to capital investments floor space cooling maintenance and operational expenses However storing data in a consolidated way using storage efficiency technologies needs to take into account the aggregated data growth rates of the applications

Predicting the data growth rates depends on several parameters which are usually outside of control and knowledge of the service provider These parameters include usage characteristics number of users and functionality used To compensate for the deficiencies in precisely predicting the data growth over a specific time frame we propose a scheme that the service provider can use to achieve operational flexibility and adaptability to handle unpredictable growth rates

Figure 5) Sample service levels ordered by service disruption and recovery time

bull Best Effort Servicesbull DevTestbull ColdFillup databull Dynamicshort term

data

Bronzebull Production

Gold

bull Production

Gold

bull Productionbull Low budget

Silver

bull Productionbull Premium

customers

Platinum

Disruption

Recovery Time

Lowest Low Best Effort

Lowest

Low

Best Effort

In this document the focus is on operational aspects of storage efficiency technologies to achieve data center consolidation and agility Thus we take a seat next to the operational staff of our sample service provider to understand their questions regarding the technologies that make up NetApp storage efficiency

We address the questions posed by the operational staff such as How do we set it up How do we run this How do we integrate necessary procedures in our daily life

10 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

A set of questions pertaining to the lifetime of a service instance and its storage arises It starts with provisioning storage in a NetApp shared storage infrastructure detection and monitoring of situations endangering the level of a service necessary response procedures and promoting a continuous and smooth delivery of services

The questions are structured around a cycle that starts at provisioning storage and finishes at deprovisioning storage Figure 6 shows important questions regarding storage efficiency from an operational point of view

bull Provisioning deals with the provisioning of storage In this document provisioning models are shown to achieve a high level of storage consolidation while preserving operational flexibility For individual applications NetApp provides a rich library of technical reports on how to provision best

bull Monitoring deals with defining the goal of the monitoring process and which metrics to use to decide when to stop (for example the provisioning of storage) Relevant metrics provided by NetApp Operations Manager are described

bull Notifying deals with how to notify people in charge of when to perform certain actions The notification mechanisms within NetApp Operations Manager are described to deliver information in case of certain events

bull Mitigation deals with mechanisms to prevent uncontrolled utilization affecting operational flexibility and service fulfillment

Figure 6) Questions regarding storage efficiency from an operational point of view

bull How to provision best for storage efficiencysect Provisioning Modelssect NetApp Data Motion

awarenesssect From scratch or

templateclonebull Where to provision tobull Which SLAbull What are the defaults

Provision

Monitor

Notification

Mitigatebull What is criticalsect When to stop provisioningsect When to stop extendingsect When to relax tightnesssect How to detect

bull Monitoringsect Toolssect What to monitor

bull Who is in charge to reactbull How to notify

bull Available optionsbull Implications on SLAs bull When to act

Before discussing the details of this cycle it is important to understand the NetApp technologies that achieve storage efficiency and flexibility and to understand their relevance in the provisioning and operational phases

11 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY

The NetApp technologies provided in Table 1 are characterized by how they bring the most significant advantage and value For example FlexClonereg technology provides significant time and space advantages while provisioning but the space advantage might be reduced over time In contrast deduplication technology can achieve space savings over the entire storage lifetime

Table 1) NetApp technologies for storage efficiency and flexibility

NetApp Technology Benefit During Provisioning

During Operation

FlexClone Instantly creates thin provisioned and space-efficient writable clones X

FlexVolreg Implements thin provisioning and consumes only the needed space rather than the requested space X X

Deduplication Increases data consolidation by detecting and optimizing repeating patterns in primary and secondary storage X X

NetApp Data Motion Provides flexibility for management and optimal loadcapacity rebalancing in growing cloud environments without downtime

X

Aggregate Extensibility in Data ONTAPreg

Data ONTAP is the foundation for all features listed in this table and provides flexibility in handling physical resources It allows extending physical aggregates during operation

X X

Furthermore NetApp RAID-DPreg SATA and NetApp Flash Cache (formerly PAM II) are technologies that help to reduce the total cost of storage tremendously It is assumed that these technologies are deployed according to the requirements of the use case

NETAPP SHARED STORAGE INFRASTRUCTURE

To implement the practices outlined in this document some prerequisites must be met We assume a NetApp shared storage infrastructure implemented using large aggregates This acts as a utility for delivering storage in a flexible manner for applications with different needs It scales with the demands and serves a variety of different service levels at the same time NetApp Operations Manager monitors the NetApp shared storage infrastructure This software acts as a central management station and consolidates information about the current status of all NetApp storage controllers Based on this information Operations Manager indicates the necessity to change the phases and behavior in the data center

The NetApp shared storage infrastructure provides different ways for clients to consume its resources It can provide a traditional view where storage resources are located at a specific controller Using NetApp Provisioning Manager the infrastructure can also provide a service-oriented resource view that abstracts resource consumption and management from their physical controllers The abstractions of a storage service catalog resource pools and datasets provide easy manageability in the face of massive scale If multi-tenancy is not required then this is the abstraction of choice

Supported by the NetApp technologies MultiStorereg (vFilertrade) and NetApp Data Motion storage can be provided in a utilitylike fashion independent of physical hardware This makes possible high operational flexibility in the data center and allows building virtualized environments for multiple tenants with competing interests

12 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

3 PROVISIONING Some features such as data deduplication can be turned on or off at any time However to achieve the maximum consolidation and storage efficiency more strategies must be considered during the data-provisioning phase

In addition provisioning should take the flexibility of storage (for example the migrate ability) into account This allows administrators to easily move data off aggregates approaching capacity without downtime This is also an important aspect when planning to deliver services 24x7

Thus design goals for storage layouts are storage efficiency and operational flexibility In the following sections we discuss three orthogonal dimensions of provisioning storage Two of them focus on achieving data consolidation The third one focuses on achieving operational flexibility All dimensions can be combined independently Note that the achievable level of consolidation depends on the applications and its data

TR-3827 If You Are Doing This Then Your Storage Could Be Underutilized provides further understanding of storage efficiency and operational flexibility

31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING

This section deals with the way data is provisioned and the consequences for storage efficiency We recommend applying the so-called zero fat configurations

We consider the storage setup for a single application instance The presented configurations can be applied while provisioning storage from scratch and to already provisioned storage When the technical dimensions of storage provisioning are categorized in primary data and its Snapshottrade copies space there are four theoretical combinations for both network-attached storage (NAS) and storage area network (SAN) environments In practical applications only two variants are relevant to NAS and three variants are relevant to SAN storage

bull Full fat bull Low fat bull Zero fat

According to NetApp best practices we do not mix block and file data within a single volume which allows us to consider NAS and SAN environments separately

NAS

For NAS two options are recommended full fat and zero fat

bull Full fat The primary data and Snapshot copy space are preallocated bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves

the best ratio of storage efficiency when provisioning applications from scratch

13 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 7) Provisioning model for NAS storage from scratch Technically only two out of four combinations are possible

Primary Data (Files amp Directory) Space Allocation

Fat Thin

Full Fat Option No Option

No Option Zero Fat Option

Fat

Thin

Snapshot Copy Space

Allocation

Note Full fat is characterized slightly different in NAS and SAN due to their technical properties

FULL FAT PROVISIONING

Full fat provisioning NAS is the traditional (default) way to implement NFSCIFS shares Volumes in a full fat configuration are characterized as follows

bull Volumes are created with space guarantee bull The size of the volume follows the formula X + Δ

X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data

bull Because space used for Snapshot copies might grow unexpectedly the autosize function can be used to make space available when reaching a certain volume threshold This would also happen when the space reserved for user data gets low

bull Space reservation for Snapshot copies is used to hide the capacity used for Snapshot copies from the consumers (NAS clients)

bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Normally using autodelete is not recommended in NAS environments Keeping a certain number of

Snapshot copies for file versioningrestores might be part of the SLAs defined for file services

Note Deleting snapshots may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision

Table 2) Full fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default

autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit

14 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

Volume Snapshot Options

reserve yes Value depends on number of Snapshot copies and change rate within the volume

schedule switched on Automatic Snapshot technology schedules

autodelete off Deleting Snapshot copies is not recommended in most NAS environments

ZERO FAT PROVISIONING

The zero fat method is the most efficient way to provision NAS volumes

bull Volumes are created without space guarantee bull The size of the volume still follows the formula X + Δ

X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data Sizing the volume defines a container with a virtual size for the consumers NAS users are familiar with fixed-sized file shares

bull Space used for Snapshot copies can grow unexpectedly You can use the autosize function to make space available when reaching a certain volume threshold You can also use the autosize function when the space reserved for user data gets low

bull Space reserved for Snapshot copies is used to hide from the consumers (NAS clients) the capacity taken up by Snapshot copies

bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Using autodelete is normally not recommended in NAS environments Keeping a certain amount of

space for Snapshot copies for file versioningrestores is part of the SLAs defined for file services

Note Deleting Snapshot copies may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision

Table 3) Zero fat provisioning

Option Recommended Value Notes

Volume Options

guarantee none

fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default

autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit

15 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first - Autodelete is not recommended in most environments

Volume Snapshot Options

reserve yesno The value depends on the number of Snapshot copies and the change rate within the volume Displaying only the committed usable space using SLA is the preferred way to provision NAS storage However there might be situations in which the Snapshot reserve area is omitted (no)

schedule switched on Automatic Snapshot technology schedules

autodelete off Deleting Snapshot copies is not recommended in most NAS environments

SAN

For SAN we consider three options

bull Full fat Both primary data and its Snapshot copy space are preallocated bull Low fat The primary data is preallocated The Snapshot copy space is allocated on demand bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves

the best ratio of storage efficiency when provisioning applications from scratch

Figure 8) Provisioning model for SAN storage from scratch

Primary Data (LUN) Space Allocation

Fat Thin

Full Fat Option No Option

Low Fat Option Zero Fat Option

Fat

Thin

Snapshot Copy Space

Allocation

FULL FAT PROVISIONING

This method can be treated as the historical way of provisioning block storage with Data ONTAP

bull Volumes are created with space guarantee bull A fractional (overwrite) reserve is used to guarantee that the primary data can be overwritten

completely with Snapshot copies in place If this space is not available Snapshot copy creation will fail

bull The size of the volume follows the formula 2X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data

16 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The enhancements to the volume autosize capabilities (such as volume size-dependent thresholds) and the robustness of the Snapshot feature autodelete implementation have made full fat provisioning more or less obsolete As of today the default settings for creating volumesLUNs in Data ONTAP still apply to these settings See Provisioning from Scratch Full Fat to Zero Fat Provisioning for a discussion of using tools such as Provisioning Manager

Table 4) Full fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 100 Even technically possible a fractional reserve below 100 incorporates a potential risk to run out of Snapshot copy overwrite space This situation should be avoided

autosize off Autosize could be used as an option to create free space needed for Snapshot copy creation

Volume Snapshot Options

reserve 0

schedule switched off

autodelete off

LUN Options

reservation enable

LOW FAT PROVISIONING

With low fat provisioning we use a more space-efficient way to provision volumes

bull Volumes are created with space guarantee bull LUNs are created with space guarantee as well This setup does not benefit from unused blocks with

a LUN (During the lifetime of a LUN the amount of free unused blocks typically decreases Without space reclamation techniques allocated blocks on the storage system stay allocated)

bull The size of the volume follows the formula X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data

bull Because space used for Snapshot copies might grow unexpectedly the autosize and autodelete policies are used to make space available when reaching a preset volume threshold

Table 5) Low fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 0 Snapshot space is controlled by autodelete and autosize options

autosize on Turn autosize on

17 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first volume_grow Increasing the size of the volume does not destroy any data or information There is no reason not to increase the size of the volume It can be reverted afterward if the volume free space increases again There might be configurations where automatic volume growth is not desired

Volume Snapshot Options

reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration of Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)

schedule switched off

autodelete on There might be Snapshot copies that are needed to fulfill certain SLAs such as backup SLAs Setting this policy needs to be negotiated with the business requirements In the worst case scenario deleting Snapshot copies is not an option

autodelete options volume oldest_first There is a precedent for Snapshot copies being a candidate for deletion oldest_first is the current default

LUN Options

reservation enable Reserves space for the LUN during creation

ZERO FAT PROVISIONING

Full and low fat provisioning use fully allocated volumes and LUNs Per-default space allocation happens within the boundaries of the LUN and the volume Zero fat follows a 100 ldquoallocate on demandrdquo concept

bull Volumes are created without space guarantee bull LUNs are created without space guarantee bull The size of the volume follows the formula X ndash N + Δ

X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data N is the amount of unused blocks within a given LUN

Table 6) Zero fat provisioning

Option Recommended Value Notes

Volume Options

guarantee none No space reservation for volume at all

fractional_reserve 0 With Data ONTAP 733 fractional_reserve can be modified even for volumes without a space guarantee of type volume Prior to Data ONTAP 733 the value was fixed at 100

autosize on Turn autosize on

18 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first volume_grow

Volume Snapshot Options

reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)

schedule switched off

autodelete off Deleting Snapshot copies might be an option when the volume can no longer be resized because the maximum configured size has been reached or when the aggregatersquos free space becomes low

LUN Options

reservation disable No preallocation of blocks for LUN

SUMMARY OF PROVISIONING METHODS

There are good reasons for using any of the provisioning methods already described however full fat for SAN environments should be avoided wherever possible because of the storage efficiency ratio Even with a 100 block usage ratio on primary data zero fat provisioning has many advantages and is the preferred method

bull The aggregatersquos free space is a global pool that can serve space for volumes This gives more flexibility than volumes with their own dedicated free space

bull For SAN volumes the block consumption can be easily monitored bull Deduplication savings go directly into the global pool of free space which is the aggregate or the

resource pool in which it belongs bull Monitoring is needed only on the aggregate level Volumes will grow on demand

Table 7) Comparison of provisioning methods

Characteristics Full Fat Low Fat Zero Fat

Space consumption 2X + Δ X + Δ X ndash N + Δ2

Space efficient

No Partially for Snapshot copies

Yes

Monitoring Optional Required on volume and aggregate level

Required on aggregate level

Notificationmitigation process required

No Optional in most cases Yes

2 N is the traditional thin provisioning impact = amount of blocks logically allocated but not used

19 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Characteristics Full Fat Low Fat Zero Fat

Pool benefitting of dedupe savings

Volume fractional reserve area

Volume free space area Aggregate free space area

Risk of an out of space condition on primary data

No No as long as autodelete is able to delete any Snapshot copies

Yes when monitoring and notification processes are missing

Typical use cases Small installations None or few storage management skills (no monitoring infrastructure)

Large database environments

Shared storage infrastructure Testdev environments Storage pools for virtualized servers

FULLLOWZERO FAT PROVISIONING WITH PROVISIONING MANAGER

NetApp Provisioning Manager focuses on fast and efficient provisioning of storage resources in the NetApp storage infrastructure It speeds up provisioning workflows and boosts capacity usage by using policy-based automation for the entire NetApp NAS and SAN infrastructure These processes are

bull Faster than manually provisioning storage bull Easier to maintain than scripts bull Instrumental in minimizing the risk of data loss resulting from misconfigured storage

Provisioning Manager applies user-defined policies to consistently select the appropriate resources for each provisioning activity This frees administrators from the headache of searching for available space to provision and allows more time for strategic issues The use of a centralized management console allows administrators to monitor the status of their provisioned storage resources Provisioning Manager can help improve business agility and capacity use shrink provisioning time and improve administrator productivity Provisioning Managerrsquos thin provisioning and deduplication capabilities provide a high level of storage efficiency from your NetApp storage investment

A GUI allows you to implement the provisioning models fulllowzero fat within Provisioning Manager See Figure 9 and Figure 10 for configuring storage efficiency in a provisioning policy for NAS and SAN Whenever storage is provisioned using this provisioning policy the settings apply automatically For more information refer to TR-3710 Operations Manager Provisioning Manager and Protection Manager Best Practices Guide

Provisioning Manager encapsulates technical details when provisioning storage and supports an easy integration with existing management tools and orchestration frameworks Policies and their use in so-called datasets and storage services allow you to exploit NetApp storage efficiency technologies without exposing a high level of technical detail to a higher level of management software

Note Provisioning Manager up to version 40 does not allow you to specify autosize and autodelete individually the policy template determines if these features are used and which options are selected In order to implement the provisioning methods outlined a customized ldquoProvisioning Scriptrdquo needs to be provided to set autosize and autodelete parameters according to the recommendations for FullLowZero methods Post provisioning scripts are standard with Provision Manager Use caution when Provisioning Manager runs conformance checks this reverts individual settings

20 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete

21 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat

FULLLOWZERO FAT PROVISIONING FOR STORAGE SERVICES

Storage services are an easy abstraction to provision storage in a utilitylike fashion A storage service describes all characteristic attributes for storage needed in a certain scenario A storage service catalog lists the available templates and allows you to provision storage with these attributes on demand Technically storage services or datasets consist of one or more resource pools a protection policy and a provisioning policy Fulllowzero fat provisioning for storage services is configured in the configuration wizard of the provisioning policy Figure 11 shows the provisioning policies closest to fulllowzero fat configurations

Figure 11) Fulllowzero fat provisioning policies for datasets and storage services

Because this wizard is able to configure the deduplication feature two policies are configured for the zero fat configurations one with deduplication and one without deduplication

22 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

HOW SHOULD A VOLUME BE SIZED

Because physical allocation of data within a zero fat-provisioned volume is done on demand theoretically the volume size can be set to a very high value that can easily keep all application data and Snapshot copies As the unallocated space in the volume is not exclusively reserved for this volume itself all other applications can benefit from the shared pool of unallocated storage However NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand The important advantage is that the commitment rate acts as a metric for data consolidation

Note The commitment rate reflects the amount of logical data consolidation This metric is suitable for deciding when data should be left for organic growth

Additionally the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers

APPLICATION RECOMMENDATIONS

Thin provisioning is most effective when applications use data that is committed to them step by step When applications preformat data the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks Because thin provisioning has no performance penalty the general recommendation is to provision with the zero fat configuration

For SAN-attached storage NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed On Windowsreg platforms this can be configured in NetApp SnapDrivereg

For Oraclereg database best practices refer to WP-7084 Storage Efficiency in an Oracle Environment

32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS

This section deals with provisioning storage for similar applications from a golden template A valid use case is a hosting provider who offers and serves predefined application services in mass quantities Instead of provisioning each application from scratch the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure

When applications are provisioned this way NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space This achieves a high degree of data consolidation and cost savings

The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance CPU and memory

There are two ways to align application data to a NetApp shared storage infrastructure

bull Volume-centric storage layout bull Dedupe-centric storage layout

Depending on the data lifetime suitability for deduplication consistency and tool constraints one way of aligning application data is more appropriate than the other

In both variants the storage of the application template can be provisioned as either full low or zero fat The cloning procedure inherits the attributes of the parent volume To create space-efficient clones the space guarantee must be set to none

23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

CONSEQUENCES FOR MONITORING

When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well

VOLUME-CENTRIC STORAGE LAYOUT

In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies

In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions

bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time

bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data

A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN

Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate

Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases

bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts

24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance 1

Deduplication Block Sharing FlexVol

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance n

FlexClone Block Sharing

Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow

Best Practice

A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance

Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided

bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning

bull Preformatting data

25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

DEDUPE-CENTRIC STORAGE LAYOUT

In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication

This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity

This storage layout provides the following storage efficiency advantages in a short- and long-term perspective

bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns

bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data

In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation

TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment

Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates

26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically

Template

Instance 1

Instance 2

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled

Best Practice

This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication

Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume

We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties

27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION

Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach

The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left

It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to

bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a

limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the

migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval

Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate

Settled

Aggregate

Nomad Nomad

To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor

28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

SLA-BASED ASSESSMENT FOR SETTLEDNOMAD

The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances

We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment

Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances

Figure 15) Alignment by technical impact (sorted by negative impact in descending order)

Settled Nomad

Eg All FC

Instance Inst1 InstN

Medium LowInside SLANeg Impact HighOutside SLA

Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest

Figure 16) Alignment by business impact (sorted by negative impact in descending order)

Settled Nomad

Instance

Neg Impact $$ $

Semi-Settled Nomad

PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION

Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers

29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion

NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations

Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility

ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE

While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted

SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING

In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting

ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS

Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore

In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller

30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise

We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives

Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases

bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously

provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on

These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future

SITUATIONS PUTTING SLA FULFILLMENT AT RISK

Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy

The following list summarize situations that are critical for service delivery

bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time

bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use

bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react

bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this

looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a

ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk

Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are

contained

31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups

41 PHASES AND TRANSITIONS

This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness

bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase

bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase

bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase

42 MONITORING

NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors

Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set

THRESHOLDS

Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation

Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page

32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 17) Operations Manager screen to configure thresholds on operational metrics

For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation

Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data

The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations

bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge

bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification

bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor

bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification

Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state

bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached

bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification

bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality

33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

TRENDING

Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly

Figure 18) Trending of data growth and days-to-full prediction in Operations Manager

Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting

The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates

On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary

Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object

34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME

For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog

MONITORING STORAGE EFFICIENCY RETURNS

NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard

Figure 19) Storage efficiency dashboard in Operations Manager

35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

43 NOTIFICATION

Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low

After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process

Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure

Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP

NOTIFY BY E-MAIL

An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42

NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager

NOTIFY BY SNMP

Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested

36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 20) Configuring an alarm based on the threshold aggregate almost full

Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there

NOTIFY BY SCRIPT

Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line

dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full

37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

44 MITIGATE STORAGE USE

Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor

Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary

When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness

MITIGATION ACTIVITIES FOR AGGREGATES

Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow

1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives

2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero

3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space

4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage

controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability

6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes

7 If none of the listed activities can be used the application must be stopped to achieve a consistent state

The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online

38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Table 8) Mitigation alternatives to control use within aggregates

No Mitigation Activity Repeatability SLA Impact Preparation Time

Time to Show Effect

1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits

None HW procurement

Immediate (+rebalancing)

Data ONTAP 8 High limits

2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate

3 Shrink other volumes in the aggregate if they have enough free space

One time Low None Immediate

4 Run deduplication and shrink volumes Repeatable Low

Time to execute dedupe

Immediate

5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time

6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh

Next planned downtime window

Minutes Volume switch-over time

7 Prevent application data loss and stop the application then migrate (offline)

Repeatable Lowndashhigh Coordinate with app owner

Minutes Migration time

MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS

Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity

Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)

Table 9) Mitigation activities for resource tightness within volumes

No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect

1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate

2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate

3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate

4 Activate FAS deduplication for the volume (requires proper space guarantees) One time

Lowpossible performance impact

Wait for schedule Hours

5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate

Repeatable High Next planned downtime window

Minutes Volume migration time

6 Stop application and migrate data Repeatable High Coordinate wapp owner

Minutes Migration time

39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations

The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum

51 SAMPLE SETTING 1 REAL-LIFE SETTING

This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative

Figure 21) Storage to enable organic data growth between planned downtime windows

Data Data Growth

Planned Downtime Window

Planned Downtime Window

Months Time

Note Several months might fall between planned downtime windows to perform major mitigation alternatives

The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely

Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate

40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth

An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are

bull First metric Aggregate capacity used bull Second metric Aggregate space committed

Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space

Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space

Data Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used

Aggregate Space Committed

0ndash50 gt 65

0ndash110 gt 120

Provisioning New Storage Y

Capacity Assessment Adapt Thresholds

Mitigate

Y Y

Y

Provisioning New Storage Y

Assess Capacity Y

41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

52 SAMPLE SETTING 2 SETTLEDNOMAD

This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online

Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months

Settled

Detecting the Need to Act

Effect of Mitigation (eg migration)

Hours Time

N NN N N

In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on

the individual situation and is calculated against 100

The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases

Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative

Detection Threshold Notify Mitigation

gt 70 Storage operations Stop provisioning of storage

gt 85 Storage operations Stop extending provisioned storage

gt 90 Storage operations Relax resource situation and migrate nomad

42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used

Settled Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used 0ndash70 70ndash85 gt 90

Provisioning New Storage Y

Extending Already Provisioned Storage

Relax UtilizationmdashNetApp Data Motion a Nomad

Y Y

Y

N N N

You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors

43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe

Elapsed Time

Capacity

1 Month 3 Months

Committed Capacity

Overall Trend

Last 3-Month Trend

Capacity Used

1 2 3

As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows

1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window

2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment

3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)

44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not

exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect

b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives

c Determine the growth rate Operations Manager provides help in determining the trend of data growth

d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past

To provision storage following these steps

1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes

2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler

entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there

is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller

c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely

d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off

Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 2: Lun Provision

2 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

TABLE OF CONTENTS

1 EXECUTIVE SUMMARY 4

2 INTRODUCTION 5

21 TERMINOLOGY 5

22 GOAL OF THIS DOCUMENT 6

23 AUDIENCE 8

24 SCENARIO 9

25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY 11

3 PROVISIONING 12

31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING 12

32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS 22

33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION 27

4 OPERATION 30

41 PHASES AND TRANSITIONS 31

42 MONITORING 31

43 NOTIFICATION 35

44 MITIGATE STORAGE USE 37

5 REAL-LIFE SETTINGS 39

51 SAMPLE SETTING 1 REAL-LIFE SETTING 39

52 SAMPLE SETTING 2 SETTLEDNOMAD 41

6 STORAGE EFFICIENCY COOKBOOK 43

7 REFERENCES 46

8 ACKNOWLEDGMENTS 47

LIST OF TABLES

Table 1) NetApp technologies for storage efficiency and flexibility 11

Table 2) Full fat provisioning 13

Table 3) Zero fat provisioning 14

Table 4) Full fat provisioning 16

Table 5) Low fat provisioning 16

Table 6) Zero fat provisioning 17

Table 7) Comparison of provisioning methods 18

Table 8) Mitigation alternatives to control use within aggregates 38

Table 9) Mitigation activities for resource tightness within volumes 38

Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative 41

3 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

LIST OF FIGURES

Figure 1) Terminology in context of the storage objects of volumes and aggregates 6

Figure 2) Storage consolidation and growing utilization using thin provisioning 7

Figure 3) Controlled and slowed data growth within the operational sweet spot corridor in the aggregate 7

Figure 4) Mitigate to prevent uncontrolled utilization 8

Figure 5) Sample service levels ordered by service disruption and recovery time 9

Figure 6) Questions regarding storage efficiency from an operational point of view 10

Figure 7) Provisioning model for NAS storage from scratch Technically only two out of four combinations are possible 13

Figure 8) Provisioning model for SAN storage from scratch 15

Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete 20

Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat 21

Figure 11) Fulllowzero fat provisioning policies for datasets and storage services 21

Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes 24

Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically 26

Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate 27

Figure 15) Alignment by technical impact (sorted by negative impact in descending order) 28

Figure 16) Alignment by business impact (sorted by negative impact in descending order) 28

Figure 17) Operations Manager screen to configure thresholds on operational metrics 32

Figure 18) Trending of data growth and days-to-full prediction in Operations Manager 33

Figure 19) Storage efficiency dashboard in Operations Manager 34

Figure 20) Configuring an alarm based on the threshold aggregate almost full 36

Figure 21) Storage to enable organic data growth between planned downtime windows 39

Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space 40

Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months 41

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used 42

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe 43

4 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

1 EXECUTIVE SUMMARY This document provides consolidated best practices to achieve and manage best-in-class storage use We introduce intervals and metrics that trigger changes in behavior in order to operate NetApp storage in a corridor of high utilization as long as possible Starting with provisioning models focused on high consolidation and operational agility we describe the operational phases and its transitions A list of mitigation alternatives describes the available alternatives to control use in the face of data growth

Finally this document presents real-life settings where high data consolidation is achieved by using NetApp storage efficiency technologies

5 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

2 INTRODUCTION Exponential data growth generates a serious challenge for IT managers Gartner predicts that within the period from 2008 to 2013 enterprises will purchase and install 20 times more terabytes of storage than they did in 2008 (wwwgartnercomtechnologymedia-productsnewslettersnetappissue24gartner3html) Until recently continuous improvements in cost for performance and storage space have made it easy and affordable to solve storage concerns by adding more disks to existing storage systems However IT executives are discovering that there are limits to that solution floor space weight loads rack space network drops power connections cooling infrastructure and even power itself are finite resources Hitting any one of these limits significantly jeopardizes the ability of IT to meet business demands

NetApprsquos solution to rapid resource consumption is to reduce storage controllers and disks from the resource equation by using storage more efficiently Key benefits of this strategy are

bull Less management involvement bull Reduced complexity support and service costs bull Improved performance and network efficiency

NetApp storage efficiency technologies are key to achieving data consolidation and managing future data growth they allow for storing and managing several times more data on NetApp storage controllers than would fit on their physical attached disks and allow the deferral of IT investments to the future

In this document we describe techniques and guidelines you can use to find the operational sweet spot for NetApp storage efficiency technologies in your environment By adhering to the best practices outlined you can increase storage consolidation and agility as well as decrease operational risk

The document is organized as follows

bull Chapter 3 describes storage provisioning bull Chapter 4 describes the monitoring process and supporting tools for daily operation bull Chapter 5 describes concrete operational setups used in daily life bull Chapter 6 concludes with a step-by-step cookbook to provision and manage storage efficiently and to

adapt individual thresholds

21 TERMINOLOGY

We use the following terminology to describe resource use on the level of exposing storage to applications and on the level of physical resource allocation within the aggregates in the storage controllers Also refer to the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further explanation of this terminology

bull Logical storage refers to storage that is visible at the application layer Logical storage does not necessarily require the allocation of usable capacity

bull Usable capacity refers to storage that is usable for the applications provided by NetApp storage controllers

bull Used capacity is a value that represents the amount of physical capacity that holds application or user data In Operations Manager1

bull Storage utilization refers to the ratio of usable capacity to used capacity without accounting for efficiency returns

terminology this is represented by capacity used

1 NetApp Operations Manager is a central console that delivers comprehensive monitoring and management for NetApp storage with alerts reports performance and configuration tools

6 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

bull Commitment rate is an Operations Manager term that refers to the percentage of aggregate space committed to volumes

bull Deduplication rate is an Operations Manager term that measures the efficiency of the deduplication functionality This rate is measured for the volume and the aggregate level in percentage metric

For the aggregate we define different operational windows characterized by an interval of storage utilization We define an interval as the operational sweet spot corridor (green) where the aggregate should be operated for optimal utilization and service availability We define a tolerance interval (yellow) where actions are taken to get back into the operational sweet spot window We define an interval as a no-go area (red) where we do not intend to operate the aggregate This area might act as a last buffer of time or can be considered an area where operational staff has less experience

Figure 1 explains the terminology in the context of storage objects on a NetApp storage controller The aggregate is a physically limited storage object Aggregates are treated as fairly static containers and thus need proper size management

Figure 1) Terminology in context of the storage objects of volumes and aggregates

Committed Logical Storage

Usable Capacity of Aggregate

Data

Data Growth

Volumes with LUNSNAS

Operational Sweet Spot Corridor

Used Capacity

In practice commitment rates far above 100 are common in customer environments This document describes how to manage this

22 GOAL OF THIS DOCUMENT

The goal of this document is to achieve best-in-class storage efficiency and costs by consolidating the highest possible amount of data of applications while meeting the required service-level agreements The idea is to enable thin provisioning while controlling the use of physical resources in the NetApp shared storage infrastructure within a desired corridor NetApp storage efficiency technologies can save a significant amount of the IT budget On the other side running at an uncontrolled use level can reduce flexibility and cause headaches in managing data growth and service level fulfillment

The difference in managing thin-provisioned storage compared to traditional storage is that due to dense consolidation of application data accumulated application data growth rates might vary in a broader corridor than they would traditionally

7 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

To control the level of physical resources we outline methods for increasing storage utilization by provisioning storage with NetApp storage efficiency technologies We also outline how to react to organic data growth and how to level the storage use within a corridor of high efficiency by using NetApp technologies for data center flexibility

To summarize this document introduces three phases to manage the storage on NetApp storage controllers provisioning organic growth and mitigation of storage tightness These phases play a vital role for aggregates which are the coarsest storage abstraction of a NetApp storage controller

bull Provisioning phase In this phase storage is provisioned by the NetApp shared storage infrastructure which increases the utilization of aggregates The goal is to operate the aggregates with a high level of data consolidation in an efficient utilization corridor Figure 2 visualizes this corridor

bull Organic growth phase In this phase no further storage is provisioned to slow down growth of aggregate utilization The goal is to operate in a corridor of high utilization but safely reach the next planned downtime or administration window of the served applications Thus the organic growth phase is sized depending on the growth rate of capacity used and the length of the time frame to the planned downtime windows Figure 3 visualizes slowed growth

bull Mitigation of storage tightness phase This phase prevents an uncontrolled level of utilization and provides mitigation activities to lower this level Several mitigation alternatives are presented to mitigate storage tightness and to shift back the aggregate utilization in the desired operational corridor Figure 4 visualizes this mitigation

Figure 2) Storage consolidation and growing utilization using thin provisioning

Data Data Growth

Operational Sweet Spot Corridor

Aggregate Capacity

Figure 3) Controlled and slowed data growth within the operational sweet spot corridor in the aggregate

Data Growth

Aggregate Capacity

Data

8 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 4) Mitigate to prevent uncontrolled utilization

Aggregate Capacity

Data

Mitigate to prevent uncontrolled utilization

This document addresses best practice and tools to manage the NetApp storage infrastructure and to support the decision making during transition between phases

23 AUDIENCE

This document addresses two audiences

bull Decision makers It provides decision makers with an understanding of how to align storage efficiency best practices and processes within their existing operations organization

bull Operational teams It allows operational teams to understand monitoring and management of the storage infrastructure while mastering data growth It allows the operational teams to implement a basic setting and to position their usage goals We refer to existing NetApp best practices to increase the level of data consolidation and to achieve overcommitment for major applications

9 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

24 SCENARIO

As a scenario we consider a service provider who delivers a set of IT services This service provider might serve internal or external customers at different service levels The service levels provided are characterized by unplanned downtime as exemplified by Figure 5 This characterization is useful for aligning service data with physical resources In our example the highest level of service availability is delivered for Platinum services It is further assumed that provided services have different lifetimes and dates of creation

The service providerrsquos major goal is to deliver the services within the specified service level and to achieve maximum data consolidation through NetApp storage efficiency technologies This directly translates into cost savings related to capital investments floor space cooling maintenance and operational expenses However storing data in a consolidated way using storage efficiency technologies needs to take into account the aggregated data growth rates of the applications

Predicting the data growth rates depends on several parameters which are usually outside of control and knowledge of the service provider These parameters include usage characteristics number of users and functionality used To compensate for the deficiencies in precisely predicting the data growth over a specific time frame we propose a scheme that the service provider can use to achieve operational flexibility and adaptability to handle unpredictable growth rates

Figure 5) Sample service levels ordered by service disruption and recovery time

bull Best Effort Servicesbull DevTestbull ColdFillup databull Dynamicshort term

data

Bronzebull Production

Gold

bull Production

Gold

bull Productionbull Low budget

Silver

bull Productionbull Premium

customers

Platinum

Disruption

Recovery Time

Lowest Low Best Effort

Lowest

Low

Best Effort

In this document the focus is on operational aspects of storage efficiency technologies to achieve data center consolidation and agility Thus we take a seat next to the operational staff of our sample service provider to understand their questions regarding the technologies that make up NetApp storage efficiency

We address the questions posed by the operational staff such as How do we set it up How do we run this How do we integrate necessary procedures in our daily life

10 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

A set of questions pertaining to the lifetime of a service instance and its storage arises It starts with provisioning storage in a NetApp shared storage infrastructure detection and monitoring of situations endangering the level of a service necessary response procedures and promoting a continuous and smooth delivery of services

The questions are structured around a cycle that starts at provisioning storage and finishes at deprovisioning storage Figure 6 shows important questions regarding storage efficiency from an operational point of view

bull Provisioning deals with the provisioning of storage In this document provisioning models are shown to achieve a high level of storage consolidation while preserving operational flexibility For individual applications NetApp provides a rich library of technical reports on how to provision best

bull Monitoring deals with defining the goal of the monitoring process and which metrics to use to decide when to stop (for example the provisioning of storage) Relevant metrics provided by NetApp Operations Manager are described

bull Notifying deals with how to notify people in charge of when to perform certain actions The notification mechanisms within NetApp Operations Manager are described to deliver information in case of certain events

bull Mitigation deals with mechanisms to prevent uncontrolled utilization affecting operational flexibility and service fulfillment

Figure 6) Questions regarding storage efficiency from an operational point of view

bull How to provision best for storage efficiencysect Provisioning Modelssect NetApp Data Motion

awarenesssect From scratch or

templateclonebull Where to provision tobull Which SLAbull What are the defaults

Provision

Monitor

Notification

Mitigatebull What is criticalsect When to stop provisioningsect When to stop extendingsect When to relax tightnesssect How to detect

bull Monitoringsect Toolssect What to monitor

bull Who is in charge to reactbull How to notify

bull Available optionsbull Implications on SLAs bull When to act

Before discussing the details of this cycle it is important to understand the NetApp technologies that achieve storage efficiency and flexibility and to understand their relevance in the provisioning and operational phases

11 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY

The NetApp technologies provided in Table 1 are characterized by how they bring the most significant advantage and value For example FlexClonereg technology provides significant time and space advantages while provisioning but the space advantage might be reduced over time In contrast deduplication technology can achieve space savings over the entire storage lifetime

Table 1) NetApp technologies for storage efficiency and flexibility

NetApp Technology Benefit During Provisioning

During Operation

FlexClone Instantly creates thin provisioned and space-efficient writable clones X

FlexVolreg Implements thin provisioning and consumes only the needed space rather than the requested space X X

Deduplication Increases data consolidation by detecting and optimizing repeating patterns in primary and secondary storage X X

NetApp Data Motion Provides flexibility for management and optimal loadcapacity rebalancing in growing cloud environments without downtime

X

Aggregate Extensibility in Data ONTAPreg

Data ONTAP is the foundation for all features listed in this table and provides flexibility in handling physical resources It allows extending physical aggregates during operation

X X

Furthermore NetApp RAID-DPreg SATA and NetApp Flash Cache (formerly PAM II) are technologies that help to reduce the total cost of storage tremendously It is assumed that these technologies are deployed according to the requirements of the use case

NETAPP SHARED STORAGE INFRASTRUCTURE

To implement the practices outlined in this document some prerequisites must be met We assume a NetApp shared storage infrastructure implemented using large aggregates This acts as a utility for delivering storage in a flexible manner for applications with different needs It scales with the demands and serves a variety of different service levels at the same time NetApp Operations Manager monitors the NetApp shared storage infrastructure This software acts as a central management station and consolidates information about the current status of all NetApp storage controllers Based on this information Operations Manager indicates the necessity to change the phases and behavior in the data center

The NetApp shared storage infrastructure provides different ways for clients to consume its resources It can provide a traditional view where storage resources are located at a specific controller Using NetApp Provisioning Manager the infrastructure can also provide a service-oriented resource view that abstracts resource consumption and management from their physical controllers The abstractions of a storage service catalog resource pools and datasets provide easy manageability in the face of massive scale If multi-tenancy is not required then this is the abstraction of choice

Supported by the NetApp technologies MultiStorereg (vFilertrade) and NetApp Data Motion storage can be provided in a utilitylike fashion independent of physical hardware This makes possible high operational flexibility in the data center and allows building virtualized environments for multiple tenants with competing interests

12 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

3 PROVISIONING Some features such as data deduplication can be turned on or off at any time However to achieve the maximum consolidation and storage efficiency more strategies must be considered during the data-provisioning phase

In addition provisioning should take the flexibility of storage (for example the migrate ability) into account This allows administrators to easily move data off aggregates approaching capacity without downtime This is also an important aspect when planning to deliver services 24x7

Thus design goals for storage layouts are storage efficiency and operational flexibility In the following sections we discuss three orthogonal dimensions of provisioning storage Two of them focus on achieving data consolidation The third one focuses on achieving operational flexibility All dimensions can be combined independently Note that the achievable level of consolidation depends on the applications and its data

TR-3827 If You Are Doing This Then Your Storage Could Be Underutilized provides further understanding of storage efficiency and operational flexibility

31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING

This section deals with the way data is provisioned and the consequences for storage efficiency We recommend applying the so-called zero fat configurations

We consider the storage setup for a single application instance The presented configurations can be applied while provisioning storage from scratch and to already provisioned storage When the technical dimensions of storage provisioning are categorized in primary data and its Snapshottrade copies space there are four theoretical combinations for both network-attached storage (NAS) and storage area network (SAN) environments In practical applications only two variants are relevant to NAS and three variants are relevant to SAN storage

bull Full fat bull Low fat bull Zero fat

According to NetApp best practices we do not mix block and file data within a single volume which allows us to consider NAS and SAN environments separately

NAS

For NAS two options are recommended full fat and zero fat

bull Full fat The primary data and Snapshot copy space are preallocated bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves

the best ratio of storage efficiency when provisioning applications from scratch

13 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 7) Provisioning model for NAS storage from scratch Technically only two out of four combinations are possible

Primary Data (Files amp Directory) Space Allocation

Fat Thin

Full Fat Option No Option

No Option Zero Fat Option

Fat

Thin

Snapshot Copy Space

Allocation

Note Full fat is characterized slightly different in NAS and SAN due to their technical properties

FULL FAT PROVISIONING

Full fat provisioning NAS is the traditional (default) way to implement NFSCIFS shares Volumes in a full fat configuration are characterized as follows

bull Volumes are created with space guarantee bull The size of the volume follows the formula X + Δ

X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data

bull Because space used for Snapshot copies might grow unexpectedly the autosize function can be used to make space available when reaching a certain volume threshold This would also happen when the space reserved for user data gets low

bull Space reservation for Snapshot copies is used to hide the capacity used for Snapshot copies from the consumers (NAS clients)

bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Normally using autodelete is not recommended in NAS environments Keeping a certain number of

Snapshot copies for file versioningrestores might be part of the SLAs defined for file services

Note Deleting snapshots may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision

Table 2) Full fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default

autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit

14 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

Volume Snapshot Options

reserve yes Value depends on number of Snapshot copies and change rate within the volume

schedule switched on Automatic Snapshot technology schedules

autodelete off Deleting Snapshot copies is not recommended in most NAS environments

ZERO FAT PROVISIONING

The zero fat method is the most efficient way to provision NAS volumes

bull Volumes are created without space guarantee bull The size of the volume still follows the formula X + Δ

X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data Sizing the volume defines a container with a virtual size for the consumers NAS users are familiar with fixed-sized file shares

bull Space used for Snapshot copies can grow unexpectedly You can use the autosize function to make space available when reaching a certain volume threshold You can also use the autosize function when the space reserved for user data gets low

bull Space reserved for Snapshot copies is used to hide from the consumers (NAS clients) the capacity taken up by Snapshot copies

bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Using autodelete is normally not recommended in NAS environments Keeping a certain amount of

space for Snapshot copies for file versioningrestores is part of the SLAs defined for file services

Note Deleting Snapshot copies may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision

Table 3) Zero fat provisioning

Option Recommended Value Notes

Volume Options

guarantee none

fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default

autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit

15 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first - Autodelete is not recommended in most environments

Volume Snapshot Options

reserve yesno The value depends on the number of Snapshot copies and the change rate within the volume Displaying only the committed usable space using SLA is the preferred way to provision NAS storage However there might be situations in which the Snapshot reserve area is omitted (no)

schedule switched on Automatic Snapshot technology schedules

autodelete off Deleting Snapshot copies is not recommended in most NAS environments

SAN

For SAN we consider three options

bull Full fat Both primary data and its Snapshot copy space are preallocated bull Low fat The primary data is preallocated The Snapshot copy space is allocated on demand bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves

the best ratio of storage efficiency when provisioning applications from scratch

Figure 8) Provisioning model for SAN storage from scratch

Primary Data (LUN) Space Allocation

Fat Thin

Full Fat Option No Option

Low Fat Option Zero Fat Option

Fat

Thin

Snapshot Copy Space

Allocation

FULL FAT PROVISIONING

This method can be treated as the historical way of provisioning block storage with Data ONTAP

bull Volumes are created with space guarantee bull A fractional (overwrite) reserve is used to guarantee that the primary data can be overwritten

completely with Snapshot copies in place If this space is not available Snapshot copy creation will fail

bull The size of the volume follows the formula 2X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data

16 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The enhancements to the volume autosize capabilities (such as volume size-dependent thresholds) and the robustness of the Snapshot feature autodelete implementation have made full fat provisioning more or less obsolete As of today the default settings for creating volumesLUNs in Data ONTAP still apply to these settings See Provisioning from Scratch Full Fat to Zero Fat Provisioning for a discussion of using tools such as Provisioning Manager

Table 4) Full fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 100 Even technically possible a fractional reserve below 100 incorporates a potential risk to run out of Snapshot copy overwrite space This situation should be avoided

autosize off Autosize could be used as an option to create free space needed for Snapshot copy creation

Volume Snapshot Options

reserve 0

schedule switched off

autodelete off

LUN Options

reservation enable

LOW FAT PROVISIONING

With low fat provisioning we use a more space-efficient way to provision volumes

bull Volumes are created with space guarantee bull LUNs are created with space guarantee as well This setup does not benefit from unused blocks with

a LUN (During the lifetime of a LUN the amount of free unused blocks typically decreases Without space reclamation techniques allocated blocks on the storage system stay allocated)

bull The size of the volume follows the formula X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data

bull Because space used for Snapshot copies might grow unexpectedly the autosize and autodelete policies are used to make space available when reaching a preset volume threshold

Table 5) Low fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 0 Snapshot space is controlled by autodelete and autosize options

autosize on Turn autosize on

17 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first volume_grow Increasing the size of the volume does not destroy any data or information There is no reason not to increase the size of the volume It can be reverted afterward if the volume free space increases again There might be configurations where automatic volume growth is not desired

Volume Snapshot Options

reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration of Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)

schedule switched off

autodelete on There might be Snapshot copies that are needed to fulfill certain SLAs such as backup SLAs Setting this policy needs to be negotiated with the business requirements In the worst case scenario deleting Snapshot copies is not an option

autodelete options volume oldest_first There is a precedent for Snapshot copies being a candidate for deletion oldest_first is the current default

LUN Options

reservation enable Reserves space for the LUN during creation

ZERO FAT PROVISIONING

Full and low fat provisioning use fully allocated volumes and LUNs Per-default space allocation happens within the boundaries of the LUN and the volume Zero fat follows a 100 ldquoallocate on demandrdquo concept

bull Volumes are created without space guarantee bull LUNs are created without space guarantee bull The size of the volume follows the formula X ndash N + Δ

X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data N is the amount of unused blocks within a given LUN

Table 6) Zero fat provisioning

Option Recommended Value Notes

Volume Options

guarantee none No space reservation for volume at all

fractional_reserve 0 With Data ONTAP 733 fractional_reserve can be modified even for volumes without a space guarantee of type volume Prior to Data ONTAP 733 the value was fixed at 100

autosize on Turn autosize on

18 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first volume_grow

Volume Snapshot Options

reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)

schedule switched off

autodelete off Deleting Snapshot copies might be an option when the volume can no longer be resized because the maximum configured size has been reached or when the aggregatersquos free space becomes low

LUN Options

reservation disable No preallocation of blocks for LUN

SUMMARY OF PROVISIONING METHODS

There are good reasons for using any of the provisioning methods already described however full fat for SAN environments should be avoided wherever possible because of the storage efficiency ratio Even with a 100 block usage ratio on primary data zero fat provisioning has many advantages and is the preferred method

bull The aggregatersquos free space is a global pool that can serve space for volumes This gives more flexibility than volumes with their own dedicated free space

bull For SAN volumes the block consumption can be easily monitored bull Deduplication savings go directly into the global pool of free space which is the aggregate or the

resource pool in which it belongs bull Monitoring is needed only on the aggregate level Volumes will grow on demand

Table 7) Comparison of provisioning methods

Characteristics Full Fat Low Fat Zero Fat

Space consumption 2X + Δ X + Δ X ndash N + Δ2

Space efficient

No Partially for Snapshot copies

Yes

Monitoring Optional Required on volume and aggregate level

Required on aggregate level

Notificationmitigation process required

No Optional in most cases Yes

2 N is the traditional thin provisioning impact = amount of blocks logically allocated but not used

19 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Characteristics Full Fat Low Fat Zero Fat

Pool benefitting of dedupe savings

Volume fractional reserve area

Volume free space area Aggregate free space area

Risk of an out of space condition on primary data

No No as long as autodelete is able to delete any Snapshot copies

Yes when monitoring and notification processes are missing

Typical use cases Small installations None or few storage management skills (no monitoring infrastructure)

Large database environments

Shared storage infrastructure Testdev environments Storage pools for virtualized servers

FULLLOWZERO FAT PROVISIONING WITH PROVISIONING MANAGER

NetApp Provisioning Manager focuses on fast and efficient provisioning of storage resources in the NetApp storage infrastructure It speeds up provisioning workflows and boosts capacity usage by using policy-based automation for the entire NetApp NAS and SAN infrastructure These processes are

bull Faster than manually provisioning storage bull Easier to maintain than scripts bull Instrumental in minimizing the risk of data loss resulting from misconfigured storage

Provisioning Manager applies user-defined policies to consistently select the appropriate resources for each provisioning activity This frees administrators from the headache of searching for available space to provision and allows more time for strategic issues The use of a centralized management console allows administrators to monitor the status of their provisioned storage resources Provisioning Manager can help improve business agility and capacity use shrink provisioning time and improve administrator productivity Provisioning Managerrsquos thin provisioning and deduplication capabilities provide a high level of storage efficiency from your NetApp storage investment

A GUI allows you to implement the provisioning models fulllowzero fat within Provisioning Manager See Figure 9 and Figure 10 for configuring storage efficiency in a provisioning policy for NAS and SAN Whenever storage is provisioned using this provisioning policy the settings apply automatically For more information refer to TR-3710 Operations Manager Provisioning Manager and Protection Manager Best Practices Guide

Provisioning Manager encapsulates technical details when provisioning storage and supports an easy integration with existing management tools and orchestration frameworks Policies and their use in so-called datasets and storage services allow you to exploit NetApp storage efficiency technologies without exposing a high level of technical detail to a higher level of management software

Note Provisioning Manager up to version 40 does not allow you to specify autosize and autodelete individually the policy template determines if these features are used and which options are selected In order to implement the provisioning methods outlined a customized ldquoProvisioning Scriptrdquo needs to be provided to set autosize and autodelete parameters according to the recommendations for FullLowZero methods Post provisioning scripts are standard with Provision Manager Use caution when Provisioning Manager runs conformance checks this reverts individual settings

20 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete

21 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat

FULLLOWZERO FAT PROVISIONING FOR STORAGE SERVICES

Storage services are an easy abstraction to provision storage in a utilitylike fashion A storage service describes all characteristic attributes for storage needed in a certain scenario A storage service catalog lists the available templates and allows you to provision storage with these attributes on demand Technically storage services or datasets consist of one or more resource pools a protection policy and a provisioning policy Fulllowzero fat provisioning for storage services is configured in the configuration wizard of the provisioning policy Figure 11 shows the provisioning policies closest to fulllowzero fat configurations

Figure 11) Fulllowzero fat provisioning policies for datasets and storage services

Because this wizard is able to configure the deduplication feature two policies are configured for the zero fat configurations one with deduplication and one without deduplication

22 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

HOW SHOULD A VOLUME BE SIZED

Because physical allocation of data within a zero fat-provisioned volume is done on demand theoretically the volume size can be set to a very high value that can easily keep all application data and Snapshot copies As the unallocated space in the volume is not exclusively reserved for this volume itself all other applications can benefit from the shared pool of unallocated storage However NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand The important advantage is that the commitment rate acts as a metric for data consolidation

Note The commitment rate reflects the amount of logical data consolidation This metric is suitable for deciding when data should be left for organic growth

Additionally the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers

APPLICATION RECOMMENDATIONS

Thin provisioning is most effective when applications use data that is committed to them step by step When applications preformat data the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks Because thin provisioning has no performance penalty the general recommendation is to provision with the zero fat configuration

For SAN-attached storage NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed On Windowsreg platforms this can be configured in NetApp SnapDrivereg

For Oraclereg database best practices refer to WP-7084 Storage Efficiency in an Oracle Environment

32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS

This section deals with provisioning storage for similar applications from a golden template A valid use case is a hosting provider who offers and serves predefined application services in mass quantities Instead of provisioning each application from scratch the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure

When applications are provisioned this way NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space This achieves a high degree of data consolidation and cost savings

The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance CPU and memory

There are two ways to align application data to a NetApp shared storage infrastructure

bull Volume-centric storage layout bull Dedupe-centric storage layout

Depending on the data lifetime suitability for deduplication consistency and tool constraints one way of aligning application data is more appropriate than the other

In both variants the storage of the application template can be provisioned as either full low or zero fat The cloning procedure inherits the attributes of the parent volume To create space-efficient clones the space guarantee must be set to none

23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

CONSEQUENCES FOR MONITORING

When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well

VOLUME-CENTRIC STORAGE LAYOUT

In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies

In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions

bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time

bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data

A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN

Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate

Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases

bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts

24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance 1

Deduplication Block Sharing FlexVol

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance n

FlexClone Block Sharing

Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow

Best Practice

A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance

Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided

bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning

bull Preformatting data

25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

DEDUPE-CENTRIC STORAGE LAYOUT

In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication

This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity

This storage layout provides the following storage efficiency advantages in a short- and long-term perspective

bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns

bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data

In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation

TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment

Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates

26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically

Template

Instance 1

Instance 2

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled

Best Practice

This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication

Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume

We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties

27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION

Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach

The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left

It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to

bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a

limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the

migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval

Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate

Settled

Aggregate

Nomad Nomad

To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor

28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

SLA-BASED ASSESSMENT FOR SETTLEDNOMAD

The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances

We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment

Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances

Figure 15) Alignment by technical impact (sorted by negative impact in descending order)

Settled Nomad

Eg All FC

Instance Inst1 InstN

Medium LowInside SLANeg Impact HighOutside SLA

Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest

Figure 16) Alignment by business impact (sorted by negative impact in descending order)

Settled Nomad

Instance

Neg Impact $$ $

Semi-Settled Nomad

PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION

Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers

29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion

NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations

Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility

ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE

While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted

SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING

In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting

ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS

Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore

In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller

30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise

We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives

Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases

bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously

provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on

These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future

SITUATIONS PUTTING SLA FULFILLMENT AT RISK

Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy

The following list summarize situations that are critical for service delivery

bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time

bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use

bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react

bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this

looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a

ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk

Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are

contained

31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups

41 PHASES AND TRANSITIONS

This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness

bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase

bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase

bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase

42 MONITORING

NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors

Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set

THRESHOLDS

Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation

Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page

32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 17) Operations Manager screen to configure thresholds on operational metrics

For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation

Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data

The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations

bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge

bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification

bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor

bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification

Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state

bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached

bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification

bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality

33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

TRENDING

Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly

Figure 18) Trending of data growth and days-to-full prediction in Operations Manager

Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting

The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates

On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary

Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object

34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME

For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog

MONITORING STORAGE EFFICIENCY RETURNS

NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard

Figure 19) Storage efficiency dashboard in Operations Manager

35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

43 NOTIFICATION

Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low

After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process

Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure

Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP

NOTIFY BY E-MAIL

An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42

NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager

NOTIFY BY SNMP

Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested

36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 20) Configuring an alarm based on the threshold aggregate almost full

Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there

NOTIFY BY SCRIPT

Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line

dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full

37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

44 MITIGATE STORAGE USE

Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor

Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary

When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness

MITIGATION ACTIVITIES FOR AGGREGATES

Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow

1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives

2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero

3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space

4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage

controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability

6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes

7 If none of the listed activities can be used the application must be stopped to achieve a consistent state

The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online

38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Table 8) Mitigation alternatives to control use within aggregates

No Mitigation Activity Repeatability SLA Impact Preparation Time

Time to Show Effect

1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits

None HW procurement

Immediate (+rebalancing)

Data ONTAP 8 High limits

2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate

3 Shrink other volumes in the aggregate if they have enough free space

One time Low None Immediate

4 Run deduplication and shrink volumes Repeatable Low

Time to execute dedupe

Immediate

5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time

6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh

Next planned downtime window

Minutes Volume switch-over time

7 Prevent application data loss and stop the application then migrate (offline)

Repeatable Lowndashhigh Coordinate with app owner

Minutes Migration time

MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS

Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity

Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)

Table 9) Mitigation activities for resource tightness within volumes

No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect

1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate

2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate

3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate

4 Activate FAS deduplication for the volume (requires proper space guarantees) One time

Lowpossible performance impact

Wait for schedule Hours

5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate

Repeatable High Next planned downtime window

Minutes Volume migration time

6 Stop application and migrate data Repeatable High Coordinate wapp owner

Minutes Migration time

39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations

The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum

51 SAMPLE SETTING 1 REAL-LIFE SETTING

This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative

Figure 21) Storage to enable organic data growth between planned downtime windows

Data Data Growth

Planned Downtime Window

Planned Downtime Window

Months Time

Note Several months might fall between planned downtime windows to perform major mitigation alternatives

The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely

Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate

40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth

An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are

bull First metric Aggregate capacity used bull Second metric Aggregate space committed

Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space

Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space

Data Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used

Aggregate Space Committed

0ndash50 gt 65

0ndash110 gt 120

Provisioning New Storage Y

Capacity Assessment Adapt Thresholds

Mitigate

Y Y

Y

Provisioning New Storage Y

Assess Capacity Y

41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

52 SAMPLE SETTING 2 SETTLEDNOMAD

This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online

Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months

Settled

Detecting the Need to Act

Effect of Mitigation (eg migration)

Hours Time

N NN N N

In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on

the individual situation and is calculated against 100

The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases

Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative

Detection Threshold Notify Mitigation

gt 70 Storage operations Stop provisioning of storage

gt 85 Storage operations Stop extending provisioned storage

gt 90 Storage operations Relax resource situation and migrate nomad

42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used

Settled Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used 0ndash70 70ndash85 gt 90

Provisioning New Storage Y

Extending Already Provisioned Storage

Relax UtilizationmdashNetApp Data Motion a Nomad

Y Y

Y

N N N

You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors

43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe

Elapsed Time

Capacity

1 Month 3 Months

Committed Capacity

Overall Trend

Last 3-Month Trend

Capacity Used

1 2 3

As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows

1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window

2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment

3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)

44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not

exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect

b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives

c Determine the growth rate Operations Manager provides help in determining the trend of data growth

d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past

To provision storage following these steps

1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes

2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler

entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there

is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller

c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely

d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off

Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 3: Lun Provision

3 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

LIST OF FIGURES

Figure 1) Terminology in context of the storage objects of volumes and aggregates 6

Figure 2) Storage consolidation and growing utilization using thin provisioning 7

Figure 3) Controlled and slowed data growth within the operational sweet spot corridor in the aggregate 7

Figure 4) Mitigate to prevent uncontrolled utilization 8

Figure 5) Sample service levels ordered by service disruption and recovery time 9

Figure 6) Questions regarding storage efficiency from an operational point of view 10

Figure 7) Provisioning model for NAS storage from scratch Technically only two out of four combinations are possible 13

Figure 8) Provisioning model for SAN storage from scratch 15

Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete 20

Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat 21

Figure 11) Fulllowzero fat provisioning policies for datasets and storage services 21

Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes 24

Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically 26

Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate 27

Figure 15) Alignment by technical impact (sorted by negative impact in descending order) 28

Figure 16) Alignment by business impact (sorted by negative impact in descending order) 28

Figure 17) Operations Manager screen to configure thresholds on operational metrics 32

Figure 18) Trending of data growth and days-to-full prediction in Operations Manager 33

Figure 19) Storage efficiency dashboard in Operations Manager 34

Figure 20) Configuring an alarm based on the threshold aggregate almost full 36

Figure 21) Storage to enable organic data growth between planned downtime windows 39

Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space 40

Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months 41

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used 42

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe 43

4 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

1 EXECUTIVE SUMMARY This document provides consolidated best practices to achieve and manage best-in-class storage use We introduce intervals and metrics that trigger changes in behavior in order to operate NetApp storage in a corridor of high utilization as long as possible Starting with provisioning models focused on high consolidation and operational agility we describe the operational phases and its transitions A list of mitigation alternatives describes the available alternatives to control use in the face of data growth

Finally this document presents real-life settings where high data consolidation is achieved by using NetApp storage efficiency technologies

5 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

2 INTRODUCTION Exponential data growth generates a serious challenge for IT managers Gartner predicts that within the period from 2008 to 2013 enterprises will purchase and install 20 times more terabytes of storage than they did in 2008 (wwwgartnercomtechnologymedia-productsnewslettersnetappissue24gartner3html) Until recently continuous improvements in cost for performance and storage space have made it easy and affordable to solve storage concerns by adding more disks to existing storage systems However IT executives are discovering that there are limits to that solution floor space weight loads rack space network drops power connections cooling infrastructure and even power itself are finite resources Hitting any one of these limits significantly jeopardizes the ability of IT to meet business demands

NetApprsquos solution to rapid resource consumption is to reduce storage controllers and disks from the resource equation by using storage more efficiently Key benefits of this strategy are

bull Less management involvement bull Reduced complexity support and service costs bull Improved performance and network efficiency

NetApp storage efficiency technologies are key to achieving data consolidation and managing future data growth they allow for storing and managing several times more data on NetApp storage controllers than would fit on their physical attached disks and allow the deferral of IT investments to the future

In this document we describe techniques and guidelines you can use to find the operational sweet spot for NetApp storage efficiency technologies in your environment By adhering to the best practices outlined you can increase storage consolidation and agility as well as decrease operational risk

The document is organized as follows

bull Chapter 3 describes storage provisioning bull Chapter 4 describes the monitoring process and supporting tools for daily operation bull Chapter 5 describes concrete operational setups used in daily life bull Chapter 6 concludes with a step-by-step cookbook to provision and manage storage efficiently and to

adapt individual thresholds

21 TERMINOLOGY

We use the following terminology to describe resource use on the level of exposing storage to applications and on the level of physical resource allocation within the aggregates in the storage controllers Also refer to the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further explanation of this terminology

bull Logical storage refers to storage that is visible at the application layer Logical storage does not necessarily require the allocation of usable capacity

bull Usable capacity refers to storage that is usable for the applications provided by NetApp storage controllers

bull Used capacity is a value that represents the amount of physical capacity that holds application or user data In Operations Manager1

bull Storage utilization refers to the ratio of usable capacity to used capacity without accounting for efficiency returns

terminology this is represented by capacity used

1 NetApp Operations Manager is a central console that delivers comprehensive monitoring and management for NetApp storage with alerts reports performance and configuration tools

6 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

bull Commitment rate is an Operations Manager term that refers to the percentage of aggregate space committed to volumes

bull Deduplication rate is an Operations Manager term that measures the efficiency of the deduplication functionality This rate is measured for the volume and the aggregate level in percentage metric

For the aggregate we define different operational windows characterized by an interval of storage utilization We define an interval as the operational sweet spot corridor (green) where the aggregate should be operated for optimal utilization and service availability We define a tolerance interval (yellow) where actions are taken to get back into the operational sweet spot window We define an interval as a no-go area (red) where we do not intend to operate the aggregate This area might act as a last buffer of time or can be considered an area where operational staff has less experience

Figure 1 explains the terminology in the context of storage objects on a NetApp storage controller The aggregate is a physically limited storage object Aggregates are treated as fairly static containers and thus need proper size management

Figure 1) Terminology in context of the storage objects of volumes and aggregates

Committed Logical Storage

Usable Capacity of Aggregate

Data

Data Growth

Volumes with LUNSNAS

Operational Sweet Spot Corridor

Used Capacity

In practice commitment rates far above 100 are common in customer environments This document describes how to manage this

22 GOAL OF THIS DOCUMENT

The goal of this document is to achieve best-in-class storage efficiency and costs by consolidating the highest possible amount of data of applications while meeting the required service-level agreements The idea is to enable thin provisioning while controlling the use of physical resources in the NetApp shared storage infrastructure within a desired corridor NetApp storage efficiency technologies can save a significant amount of the IT budget On the other side running at an uncontrolled use level can reduce flexibility and cause headaches in managing data growth and service level fulfillment

The difference in managing thin-provisioned storage compared to traditional storage is that due to dense consolidation of application data accumulated application data growth rates might vary in a broader corridor than they would traditionally

7 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

To control the level of physical resources we outline methods for increasing storage utilization by provisioning storage with NetApp storage efficiency technologies We also outline how to react to organic data growth and how to level the storage use within a corridor of high efficiency by using NetApp technologies for data center flexibility

To summarize this document introduces three phases to manage the storage on NetApp storage controllers provisioning organic growth and mitigation of storage tightness These phases play a vital role for aggregates which are the coarsest storage abstraction of a NetApp storage controller

bull Provisioning phase In this phase storage is provisioned by the NetApp shared storage infrastructure which increases the utilization of aggregates The goal is to operate the aggregates with a high level of data consolidation in an efficient utilization corridor Figure 2 visualizes this corridor

bull Organic growth phase In this phase no further storage is provisioned to slow down growth of aggregate utilization The goal is to operate in a corridor of high utilization but safely reach the next planned downtime or administration window of the served applications Thus the organic growth phase is sized depending on the growth rate of capacity used and the length of the time frame to the planned downtime windows Figure 3 visualizes slowed growth

bull Mitigation of storage tightness phase This phase prevents an uncontrolled level of utilization and provides mitigation activities to lower this level Several mitigation alternatives are presented to mitigate storage tightness and to shift back the aggregate utilization in the desired operational corridor Figure 4 visualizes this mitigation

Figure 2) Storage consolidation and growing utilization using thin provisioning

Data Data Growth

Operational Sweet Spot Corridor

Aggregate Capacity

Figure 3) Controlled and slowed data growth within the operational sweet spot corridor in the aggregate

Data Growth

Aggregate Capacity

Data

8 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 4) Mitigate to prevent uncontrolled utilization

Aggregate Capacity

Data

Mitigate to prevent uncontrolled utilization

This document addresses best practice and tools to manage the NetApp storage infrastructure and to support the decision making during transition between phases

23 AUDIENCE

This document addresses two audiences

bull Decision makers It provides decision makers with an understanding of how to align storage efficiency best practices and processes within their existing operations organization

bull Operational teams It allows operational teams to understand monitoring and management of the storage infrastructure while mastering data growth It allows the operational teams to implement a basic setting and to position their usage goals We refer to existing NetApp best practices to increase the level of data consolidation and to achieve overcommitment for major applications

9 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

24 SCENARIO

As a scenario we consider a service provider who delivers a set of IT services This service provider might serve internal or external customers at different service levels The service levels provided are characterized by unplanned downtime as exemplified by Figure 5 This characterization is useful for aligning service data with physical resources In our example the highest level of service availability is delivered for Platinum services It is further assumed that provided services have different lifetimes and dates of creation

The service providerrsquos major goal is to deliver the services within the specified service level and to achieve maximum data consolidation through NetApp storage efficiency technologies This directly translates into cost savings related to capital investments floor space cooling maintenance and operational expenses However storing data in a consolidated way using storage efficiency technologies needs to take into account the aggregated data growth rates of the applications

Predicting the data growth rates depends on several parameters which are usually outside of control and knowledge of the service provider These parameters include usage characteristics number of users and functionality used To compensate for the deficiencies in precisely predicting the data growth over a specific time frame we propose a scheme that the service provider can use to achieve operational flexibility and adaptability to handle unpredictable growth rates

Figure 5) Sample service levels ordered by service disruption and recovery time

bull Best Effort Servicesbull DevTestbull ColdFillup databull Dynamicshort term

data

Bronzebull Production

Gold

bull Production

Gold

bull Productionbull Low budget

Silver

bull Productionbull Premium

customers

Platinum

Disruption

Recovery Time

Lowest Low Best Effort

Lowest

Low

Best Effort

In this document the focus is on operational aspects of storage efficiency technologies to achieve data center consolidation and agility Thus we take a seat next to the operational staff of our sample service provider to understand their questions regarding the technologies that make up NetApp storage efficiency

We address the questions posed by the operational staff such as How do we set it up How do we run this How do we integrate necessary procedures in our daily life

10 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

A set of questions pertaining to the lifetime of a service instance and its storage arises It starts with provisioning storage in a NetApp shared storage infrastructure detection and monitoring of situations endangering the level of a service necessary response procedures and promoting a continuous and smooth delivery of services

The questions are structured around a cycle that starts at provisioning storage and finishes at deprovisioning storage Figure 6 shows important questions regarding storage efficiency from an operational point of view

bull Provisioning deals with the provisioning of storage In this document provisioning models are shown to achieve a high level of storage consolidation while preserving operational flexibility For individual applications NetApp provides a rich library of technical reports on how to provision best

bull Monitoring deals with defining the goal of the monitoring process and which metrics to use to decide when to stop (for example the provisioning of storage) Relevant metrics provided by NetApp Operations Manager are described

bull Notifying deals with how to notify people in charge of when to perform certain actions The notification mechanisms within NetApp Operations Manager are described to deliver information in case of certain events

bull Mitigation deals with mechanisms to prevent uncontrolled utilization affecting operational flexibility and service fulfillment

Figure 6) Questions regarding storage efficiency from an operational point of view

bull How to provision best for storage efficiencysect Provisioning Modelssect NetApp Data Motion

awarenesssect From scratch or

templateclonebull Where to provision tobull Which SLAbull What are the defaults

Provision

Monitor

Notification

Mitigatebull What is criticalsect When to stop provisioningsect When to stop extendingsect When to relax tightnesssect How to detect

bull Monitoringsect Toolssect What to monitor

bull Who is in charge to reactbull How to notify

bull Available optionsbull Implications on SLAs bull When to act

Before discussing the details of this cycle it is important to understand the NetApp technologies that achieve storage efficiency and flexibility and to understand their relevance in the provisioning and operational phases

11 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY

The NetApp technologies provided in Table 1 are characterized by how they bring the most significant advantage and value For example FlexClonereg technology provides significant time and space advantages while provisioning but the space advantage might be reduced over time In contrast deduplication technology can achieve space savings over the entire storage lifetime

Table 1) NetApp technologies for storage efficiency and flexibility

NetApp Technology Benefit During Provisioning

During Operation

FlexClone Instantly creates thin provisioned and space-efficient writable clones X

FlexVolreg Implements thin provisioning and consumes only the needed space rather than the requested space X X

Deduplication Increases data consolidation by detecting and optimizing repeating patterns in primary and secondary storage X X

NetApp Data Motion Provides flexibility for management and optimal loadcapacity rebalancing in growing cloud environments without downtime

X

Aggregate Extensibility in Data ONTAPreg

Data ONTAP is the foundation for all features listed in this table and provides flexibility in handling physical resources It allows extending physical aggregates during operation

X X

Furthermore NetApp RAID-DPreg SATA and NetApp Flash Cache (formerly PAM II) are technologies that help to reduce the total cost of storage tremendously It is assumed that these technologies are deployed according to the requirements of the use case

NETAPP SHARED STORAGE INFRASTRUCTURE

To implement the practices outlined in this document some prerequisites must be met We assume a NetApp shared storage infrastructure implemented using large aggregates This acts as a utility for delivering storage in a flexible manner for applications with different needs It scales with the demands and serves a variety of different service levels at the same time NetApp Operations Manager monitors the NetApp shared storage infrastructure This software acts as a central management station and consolidates information about the current status of all NetApp storage controllers Based on this information Operations Manager indicates the necessity to change the phases and behavior in the data center

The NetApp shared storage infrastructure provides different ways for clients to consume its resources It can provide a traditional view where storage resources are located at a specific controller Using NetApp Provisioning Manager the infrastructure can also provide a service-oriented resource view that abstracts resource consumption and management from their physical controllers The abstractions of a storage service catalog resource pools and datasets provide easy manageability in the face of massive scale If multi-tenancy is not required then this is the abstraction of choice

Supported by the NetApp technologies MultiStorereg (vFilertrade) and NetApp Data Motion storage can be provided in a utilitylike fashion independent of physical hardware This makes possible high operational flexibility in the data center and allows building virtualized environments for multiple tenants with competing interests

12 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

3 PROVISIONING Some features such as data deduplication can be turned on or off at any time However to achieve the maximum consolidation and storage efficiency more strategies must be considered during the data-provisioning phase

In addition provisioning should take the flexibility of storage (for example the migrate ability) into account This allows administrators to easily move data off aggregates approaching capacity without downtime This is also an important aspect when planning to deliver services 24x7

Thus design goals for storage layouts are storage efficiency and operational flexibility In the following sections we discuss three orthogonal dimensions of provisioning storage Two of them focus on achieving data consolidation The third one focuses on achieving operational flexibility All dimensions can be combined independently Note that the achievable level of consolidation depends on the applications and its data

TR-3827 If You Are Doing This Then Your Storage Could Be Underutilized provides further understanding of storage efficiency and operational flexibility

31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING

This section deals with the way data is provisioned and the consequences for storage efficiency We recommend applying the so-called zero fat configurations

We consider the storage setup for a single application instance The presented configurations can be applied while provisioning storage from scratch and to already provisioned storage When the technical dimensions of storage provisioning are categorized in primary data and its Snapshottrade copies space there are four theoretical combinations for both network-attached storage (NAS) and storage area network (SAN) environments In practical applications only two variants are relevant to NAS and three variants are relevant to SAN storage

bull Full fat bull Low fat bull Zero fat

According to NetApp best practices we do not mix block and file data within a single volume which allows us to consider NAS and SAN environments separately

NAS

For NAS two options are recommended full fat and zero fat

bull Full fat The primary data and Snapshot copy space are preallocated bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves

the best ratio of storage efficiency when provisioning applications from scratch

13 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 7) Provisioning model for NAS storage from scratch Technically only two out of four combinations are possible

Primary Data (Files amp Directory) Space Allocation

Fat Thin

Full Fat Option No Option

No Option Zero Fat Option

Fat

Thin

Snapshot Copy Space

Allocation

Note Full fat is characterized slightly different in NAS and SAN due to their technical properties

FULL FAT PROVISIONING

Full fat provisioning NAS is the traditional (default) way to implement NFSCIFS shares Volumes in a full fat configuration are characterized as follows

bull Volumes are created with space guarantee bull The size of the volume follows the formula X + Δ

X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data

bull Because space used for Snapshot copies might grow unexpectedly the autosize function can be used to make space available when reaching a certain volume threshold This would also happen when the space reserved for user data gets low

bull Space reservation for Snapshot copies is used to hide the capacity used for Snapshot copies from the consumers (NAS clients)

bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Normally using autodelete is not recommended in NAS environments Keeping a certain number of

Snapshot copies for file versioningrestores might be part of the SLAs defined for file services

Note Deleting snapshots may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision

Table 2) Full fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default

autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit

14 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

Volume Snapshot Options

reserve yes Value depends on number of Snapshot copies and change rate within the volume

schedule switched on Automatic Snapshot technology schedules

autodelete off Deleting Snapshot copies is not recommended in most NAS environments

ZERO FAT PROVISIONING

The zero fat method is the most efficient way to provision NAS volumes

bull Volumes are created without space guarantee bull The size of the volume still follows the formula X + Δ

X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data Sizing the volume defines a container with a virtual size for the consumers NAS users are familiar with fixed-sized file shares

bull Space used for Snapshot copies can grow unexpectedly You can use the autosize function to make space available when reaching a certain volume threshold You can also use the autosize function when the space reserved for user data gets low

bull Space reserved for Snapshot copies is used to hide from the consumers (NAS clients) the capacity taken up by Snapshot copies

bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Using autodelete is normally not recommended in NAS environments Keeping a certain amount of

space for Snapshot copies for file versioningrestores is part of the SLAs defined for file services

Note Deleting Snapshot copies may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision

Table 3) Zero fat provisioning

Option Recommended Value Notes

Volume Options

guarantee none

fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default

autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit

15 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first - Autodelete is not recommended in most environments

Volume Snapshot Options

reserve yesno The value depends on the number of Snapshot copies and the change rate within the volume Displaying only the committed usable space using SLA is the preferred way to provision NAS storage However there might be situations in which the Snapshot reserve area is omitted (no)

schedule switched on Automatic Snapshot technology schedules

autodelete off Deleting Snapshot copies is not recommended in most NAS environments

SAN

For SAN we consider three options

bull Full fat Both primary data and its Snapshot copy space are preallocated bull Low fat The primary data is preallocated The Snapshot copy space is allocated on demand bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves

the best ratio of storage efficiency when provisioning applications from scratch

Figure 8) Provisioning model for SAN storage from scratch

Primary Data (LUN) Space Allocation

Fat Thin

Full Fat Option No Option

Low Fat Option Zero Fat Option

Fat

Thin

Snapshot Copy Space

Allocation

FULL FAT PROVISIONING

This method can be treated as the historical way of provisioning block storage with Data ONTAP

bull Volumes are created with space guarantee bull A fractional (overwrite) reserve is used to guarantee that the primary data can be overwritten

completely with Snapshot copies in place If this space is not available Snapshot copy creation will fail

bull The size of the volume follows the formula 2X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data

16 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The enhancements to the volume autosize capabilities (such as volume size-dependent thresholds) and the robustness of the Snapshot feature autodelete implementation have made full fat provisioning more or less obsolete As of today the default settings for creating volumesLUNs in Data ONTAP still apply to these settings See Provisioning from Scratch Full Fat to Zero Fat Provisioning for a discussion of using tools such as Provisioning Manager

Table 4) Full fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 100 Even technically possible a fractional reserve below 100 incorporates a potential risk to run out of Snapshot copy overwrite space This situation should be avoided

autosize off Autosize could be used as an option to create free space needed for Snapshot copy creation

Volume Snapshot Options

reserve 0

schedule switched off

autodelete off

LUN Options

reservation enable

LOW FAT PROVISIONING

With low fat provisioning we use a more space-efficient way to provision volumes

bull Volumes are created with space guarantee bull LUNs are created with space guarantee as well This setup does not benefit from unused blocks with

a LUN (During the lifetime of a LUN the amount of free unused blocks typically decreases Without space reclamation techniques allocated blocks on the storage system stay allocated)

bull The size of the volume follows the formula X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data

bull Because space used for Snapshot copies might grow unexpectedly the autosize and autodelete policies are used to make space available when reaching a preset volume threshold

Table 5) Low fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 0 Snapshot space is controlled by autodelete and autosize options

autosize on Turn autosize on

17 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first volume_grow Increasing the size of the volume does not destroy any data or information There is no reason not to increase the size of the volume It can be reverted afterward if the volume free space increases again There might be configurations where automatic volume growth is not desired

Volume Snapshot Options

reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration of Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)

schedule switched off

autodelete on There might be Snapshot copies that are needed to fulfill certain SLAs such as backup SLAs Setting this policy needs to be negotiated with the business requirements In the worst case scenario deleting Snapshot copies is not an option

autodelete options volume oldest_first There is a precedent for Snapshot copies being a candidate for deletion oldest_first is the current default

LUN Options

reservation enable Reserves space for the LUN during creation

ZERO FAT PROVISIONING

Full and low fat provisioning use fully allocated volumes and LUNs Per-default space allocation happens within the boundaries of the LUN and the volume Zero fat follows a 100 ldquoallocate on demandrdquo concept

bull Volumes are created without space guarantee bull LUNs are created without space guarantee bull The size of the volume follows the formula X ndash N + Δ

X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data N is the amount of unused blocks within a given LUN

Table 6) Zero fat provisioning

Option Recommended Value Notes

Volume Options

guarantee none No space reservation for volume at all

fractional_reserve 0 With Data ONTAP 733 fractional_reserve can be modified even for volumes without a space guarantee of type volume Prior to Data ONTAP 733 the value was fixed at 100

autosize on Turn autosize on

18 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first volume_grow

Volume Snapshot Options

reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)

schedule switched off

autodelete off Deleting Snapshot copies might be an option when the volume can no longer be resized because the maximum configured size has been reached or when the aggregatersquos free space becomes low

LUN Options

reservation disable No preallocation of blocks for LUN

SUMMARY OF PROVISIONING METHODS

There are good reasons for using any of the provisioning methods already described however full fat for SAN environments should be avoided wherever possible because of the storage efficiency ratio Even with a 100 block usage ratio on primary data zero fat provisioning has many advantages and is the preferred method

bull The aggregatersquos free space is a global pool that can serve space for volumes This gives more flexibility than volumes with their own dedicated free space

bull For SAN volumes the block consumption can be easily monitored bull Deduplication savings go directly into the global pool of free space which is the aggregate or the

resource pool in which it belongs bull Monitoring is needed only on the aggregate level Volumes will grow on demand

Table 7) Comparison of provisioning methods

Characteristics Full Fat Low Fat Zero Fat

Space consumption 2X + Δ X + Δ X ndash N + Δ2

Space efficient

No Partially for Snapshot copies

Yes

Monitoring Optional Required on volume and aggregate level

Required on aggregate level

Notificationmitigation process required

No Optional in most cases Yes

2 N is the traditional thin provisioning impact = amount of blocks logically allocated but not used

19 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Characteristics Full Fat Low Fat Zero Fat

Pool benefitting of dedupe savings

Volume fractional reserve area

Volume free space area Aggregate free space area

Risk of an out of space condition on primary data

No No as long as autodelete is able to delete any Snapshot copies

Yes when monitoring and notification processes are missing

Typical use cases Small installations None or few storage management skills (no monitoring infrastructure)

Large database environments

Shared storage infrastructure Testdev environments Storage pools for virtualized servers

FULLLOWZERO FAT PROVISIONING WITH PROVISIONING MANAGER

NetApp Provisioning Manager focuses on fast and efficient provisioning of storage resources in the NetApp storage infrastructure It speeds up provisioning workflows and boosts capacity usage by using policy-based automation for the entire NetApp NAS and SAN infrastructure These processes are

bull Faster than manually provisioning storage bull Easier to maintain than scripts bull Instrumental in minimizing the risk of data loss resulting from misconfigured storage

Provisioning Manager applies user-defined policies to consistently select the appropriate resources for each provisioning activity This frees administrators from the headache of searching for available space to provision and allows more time for strategic issues The use of a centralized management console allows administrators to monitor the status of their provisioned storage resources Provisioning Manager can help improve business agility and capacity use shrink provisioning time and improve administrator productivity Provisioning Managerrsquos thin provisioning and deduplication capabilities provide a high level of storage efficiency from your NetApp storage investment

A GUI allows you to implement the provisioning models fulllowzero fat within Provisioning Manager See Figure 9 and Figure 10 for configuring storage efficiency in a provisioning policy for NAS and SAN Whenever storage is provisioned using this provisioning policy the settings apply automatically For more information refer to TR-3710 Operations Manager Provisioning Manager and Protection Manager Best Practices Guide

Provisioning Manager encapsulates technical details when provisioning storage and supports an easy integration with existing management tools and orchestration frameworks Policies and their use in so-called datasets and storage services allow you to exploit NetApp storage efficiency technologies without exposing a high level of technical detail to a higher level of management software

Note Provisioning Manager up to version 40 does not allow you to specify autosize and autodelete individually the policy template determines if these features are used and which options are selected In order to implement the provisioning methods outlined a customized ldquoProvisioning Scriptrdquo needs to be provided to set autosize and autodelete parameters according to the recommendations for FullLowZero methods Post provisioning scripts are standard with Provision Manager Use caution when Provisioning Manager runs conformance checks this reverts individual settings

20 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete

21 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat

FULLLOWZERO FAT PROVISIONING FOR STORAGE SERVICES

Storage services are an easy abstraction to provision storage in a utilitylike fashion A storage service describes all characteristic attributes for storage needed in a certain scenario A storage service catalog lists the available templates and allows you to provision storage with these attributes on demand Technically storage services or datasets consist of one or more resource pools a protection policy and a provisioning policy Fulllowzero fat provisioning for storage services is configured in the configuration wizard of the provisioning policy Figure 11 shows the provisioning policies closest to fulllowzero fat configurations

Figure 11) Fulllowzero fat provisioning policies for datasets and storage services

Because this wizard is able to configure the deduplication feature two policies are configured for the zero fat configurations one with deduplication and one without deduplication

22 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

HOW SHOULD A VOLUME BE SIZED

Because physical allocation of data within a zero fat-provisioned volume is done on demand theoretically the volume size can be set to a very high value that can easily keep all application data and Snapshot copies As the unallocated space in the volume is not exclusively reserved for this volume itself all other applications can benefit from the shared pool of unallocated storage However NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand The important advantage is that the commitment rate acts as a metric for data consolidation

Note The commitment rate reflects the amount of logical data consolidation This metric is suitable for deciding when data should be left for organic growth

Additionally the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers

APPLICATION RECOMMENDATIONS

Thin provisioning is most effective when applications use data that is committed to them step by step When applications preformat data the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks Because thin provisioning has no performance penalty the general recommendation is to provision with the zero fat configuration

For SAN-attached storage NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed On Windowsreg platforms this can be configured in NetApp SnapDrivereg

For Oraclereg database best practices refer to WP-7084 Storage Efficiency in an Oracle Environment

32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS

This section deals with provisioning storage for similar applications from a golden template A valid use case is a hosting provider who offers and serves predefined application services in mass quantities Instead of provisioning each application from scratch the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure

When applications are provisioned this way NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space This achieves a high degree of data consolidation and cost savings

The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance CPU and memory

There are two ways to align application data to a NetApp shared storage infrastructure

bull Volume-centric storage layout bull Dedupe-centric storage layout

Depending on the data lifetime suitability for deduplication consistency and tool constraints one way of aligning application data is more appropriate than the other

In both variants the storage of the application template can be provisioned as either full low or zero fat The cloning procedure inherits the attributes of the parent volume To create space-efficient clones the space guarantee must be set to none

23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

CONSEQUENCES FOR MONITORING

When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well

VOLUME-CENTRIC STORAGE LAYOUT

In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies

In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions

bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time

bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data

A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN

Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate

Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases

bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts

24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance 1

Deduplication Block Sharing FlexVol

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance n

FlexClone Block Sharing

Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow

Best Practice

A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance

Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided

bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning

bull Preformatting data

25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

DEDUPE-CENTRIC STORAGE LAYOUT

In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication

This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity

This storage layout provides the following storage efficiency advantages in a short- and long-term perspective

bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns

bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data

In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation

TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment

Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates

26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically

Template

Instance 1

Instance 2

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled

Best Practice

This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication

Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume

We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties

27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION

Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach

The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left

It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to

bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a

limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the

migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval

Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate

Settled

Aggregate

Nomad Nomad

To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor

28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

SLA-BASED ASSESSMENT FOR SETTLEDNOMAD

The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances

We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment

Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances

Figure 15) Alignment by technical impact (sorted by negative impact in descending order)

Settled Nomad

Eg All FC

Instance Inst1 InstN

Medium LowInside SLANeg Impact HighOutside SLA

Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest

Figure 16) Alignment by business impact (sorted by negative impact in descending order)

Settled Nomad

Instance

Neg Impact $$ $

Semi-Settled Nomad

PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION

Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers

29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion

NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations

Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility

ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE

While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted

SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING

In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting

ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS

Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore

In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller

30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise

We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives

Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases

bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously

provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on

These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future

SITUATIONS PUTTING SLA FULFILLMENT AT RISK

Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy

The following list summarize situations that are critical for service delivery

bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time

bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use

bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react

bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this

looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a

ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk

Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are

contained

31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups

41 PHASES AND TRANSITIONS

This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness

bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase

bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase

bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase

42 MONITORING

NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors

Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set

THRESHOLDS

Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation

Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page

32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 17) Operations Manager screen to configure thresholds on operational metrics

For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation

Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data

The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations

bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge

bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification

bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor

bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification

Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state

bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached

bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification

bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality

33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

TRENDING

Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly

Figure 18) Trending of data growth and days-to-full prediction in Operations Manager

Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting

The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates

On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary

Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object

34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME

For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog

MONITORING STORAGE EFFICIENCY RETURNS

NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard

Figure 19) Storage efficiency dashboard in Operations Manager

35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

43 NOTIFICATION

Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low

After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process

Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure

Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP

NOTIFY BY E-MAIL

An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42

NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager

NOTIFY BY SNMP

Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested

36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 20) Configuring an alarm based on the threshold aggregate almost full

Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there

NOTIFY BY SCRIPT

Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line

dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full

37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

44 MITIGATE STORAGE USE

Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor

Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary

When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness

MITIGATION ACTIVITIES FOR AGGREGATES

Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow

1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives

2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero

3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space

4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage

controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability

6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes

7 If none of the listed activities can be used the application must be stopped to achieve a consistent state

The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online

38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Table 8) Mitigation alternatives to control use within aggregates

No Mitigation Activity Repeatability SLA Impact Preparation Time

Time to Show Effect

1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits

None HW procurement

Immediate (+rebalancing)

Data ONTAP 8 High limits

2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate

3 Shrink other volumes in the aggregate if they have enough free space

One time Low None Immediate

4 Run deduplication and shrink volumes Repeatable Low

Time to execute dedupe

Immediate

5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time

6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh

Next planned downtime window

Minutes Volume switch-over time

7 Prevent application data loss and stop the application then migrate (offline)

Repeatable Lowndashhigh Coordinate with app owner

Minutes Migration time

MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS

Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity

Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)

Table 9) Mitigation activities for resource tightness within volumes

No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect

1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate

2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate

3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate

4 Activate FAS deduplication for the volume (requires proper space guarantees) One time

Lowpossible performance impact

Wait for schedule Hours

5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate

Repeatable High Next planned downtime window

Minutes Volume migration time

6 Stop application and migrate data Repeatable High Coordinate wapp owner

Minutes Migration time

39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations

The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum

51 SAMPLE SETTING 1 REAL-LIFE SETTING

This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative

Figure 21) Storage to enable organic data growth between planned downtime windows

Data Data Growth

Planned Downtime Window

Planned Downtime Window

Months Time

Note Several months might fall between planned downtime windows to perform major mitigation alternatives

The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely

Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate

40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth

An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are

bull First metric Aggregate capacity used bull Second metric Aggregate space committed

Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space

Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space

Data Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used

Aggregate Space Committed

0ndash50 gt 65

0ndash110 gt 120

Provisioning New Storage Y

Capacity Assessment Adapt Thresholds

Mitigate

Y Y

Y

Provisioning New Storage Y

Assess Capacity Y

41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

52 SAMPLE SETTING 2 SETTLEDNOMAD

This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online

Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months

Settled

Detecting the Need to Act

Effect of Mitigation (eg migration)

Hours Time

N NN N N

In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on

the individual situation and is calculated against 100

The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases

Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative

Detection Threshold Notify Mitigation

gt 70 Storage operations Stop provisioning of storage

gt 85 Storage operations Stop extending provisioned storage

gt 90 Storage operations Relax resource situation and migrate nomad

42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used

Settled Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used 0ndash70 70ndash85 gt 90

Provisioning New Storage Y

Extending Already Provisioned Storage

Relax UtilizationmdashNetApp Data Motion a Nomad

Y Y

Y

N N N

You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors

43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe

Elapsed Time

Capacity

1 Month 3 Months

Committed Capacity

Overall Trend

Last 3-Month Trend

Capacity Used

1 2 3

As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows

1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window

2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment

3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)

44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not

exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect

b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives

c Determine the growth rate Operations Manager provides help in determining the trend of data growth

d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past

To provision storage following these steps

1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes

2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler

entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there

is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller

c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely

d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off

Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 4: Lun Provision

4 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

1 EXECUTIVE SUMMARY This document provides consolidated best practices to achieve and manage best-in-class storage use We introduce intervals and metrics that trigger changes in behavior in order to operate NetApp storage in a corridor of high utilization as long as possible Starting with provisioning models focused on high consolidation and operational agility we describe the operational phases and its transitions A list of mitigation alternatives describes the available alternatives to control use in the face of data growth

Finally this document presents real-life settings where high data consolidation is achieved by using NetApp storage efficiency technologies

5 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

2 INTRODUCTION Exponential data growth generates a serious challenge for IT managers Gartner predicts that within the period from 2008 to 2013 enterprises will purchase and install 20 times more terabytes of storage than they did in 2008 (wwwgartnercomtechnologymedia-productsnewslettersnetappissue24gartner3html) Until recently continuous improvements in cost for performance and storage space have made it easy and affordable to solve storage concerns by adding more disks to existing storage systems However IT executives are discovering that there are limits to that solution floor space weight loads rack space network drops power connections cooling infrastructure and even power itself are finite resources Hitting any one of these limits significantly jeopardizes the ability of IT to meet business demands

NetApprsquos solution to rapid resource consumption is to reduce storage controllers and disks from the resource equation by using storage more efficiently Key benefits of this strategy are

bull Less management involvement bull Reduced complexity support and service costs bull Improved performance and network efficiency

NetApp storage efficiency technologies are key to achieving data consolidation and managing future data growth they allow for storing and managing several times more data on NetApp storage controllers than would fit on their physical attached disks and allow the deferral of IT investments to the future

In this document we describe techniques and guidelines you can use to find the operational sweet spot for NetApp storage efficiency technologies in your environment By adhering to the best practices outlined you can increase storage consolidation and agility as well as decrease operational risk

The document is organized as follows

bull Chapter 3 describes storage provisioning bull Chapter 4 describes the monitoring process and supporting tools for daily operation bull Chapter 5 describes concrete operational setups used in daily life bull Chapter 6 concludes with a step-by-step cookbook to provision and manage storage efficiently and to

adapt individual thresholds

21 TERMINOLOGY

We use the following terminology to describe resource use on the level of exposing storage to applications and on the level of physical resource allocation within the aggregates in the storage controllers Also refer to the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further explanation of this terminology

bull Logical storage refers to storage that is visible at the application layer Logical storage does not necessarily require the allocation of usable capacity

bull Usable capacity refers to storage that is usable for the applications provided by NetApp storage controllers

bull Used capacity is a value that represents the amount of physical capacity that holds application or user data In Operations Manager1

bull Storage utilization refers to the ratio of usable capacity to used capacity without accounting for efficiency returns

terminology this is represented by capacity used

1 NetApp Operations Manager is a central console that delivers comprehensive monitoring and management for NetApp storage with alerts reports performance and configuration tools

6 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

bull Commitment rate is an Operations Manager term that refers to the percentage of aggregate space committed to volumes

bull Deduplication rate is an Operations Manager term that measures the efficiency of the deduplication functionality This rate is measured for the volume and the aggregate level in percentage metric

For the aggregate we define different operational windows characterized by an interval of storage utilization We define an interval as the operational sweet spot corridor (green) where the aggregate should be operated for optimal utilization and service availability We define a tolerance interval (yellow) where actions are taken to get back into the operational sweet spot window We define an interval as a no-go area (red) where we do not intend to operate the aggregate This area might act as a last buffer of time or can be considered an area where operational staff has less experience

Figure 1 explains the terminology in the context of storage objects on a NetApp storage controller The aggregate is a physically limited storage object Aggregates are treated as fairly static containers and thus need proper size management

Figure 1) Terminology in context of the storage objects of volumes and aggregates

Committed Logical Storage

Usable Capacity of Aggregate

Data

Data Growth

Volumes with LUNSNAS

Operational Sweet Spot Corridor

Used Capacity

In practice commitment rates far above 100 are common in customer environments This document describes how to manage this

22 GOAL OF THIS DOCUMENT

The goal of this document is to achieve best-in-class storage efficiency and costs by consolidating the highest possible amount of data of applications while meeting the required service-level agreements The idea is to enable thin provisioning while controlling the use of physical resources in the NetApp shared storage infrastructure within a desired corridor NetApp storage efficiency technologies can save a significant amount of the IT budget On the other side running at an uncontrolled use level can reduce flexibility and cause headaches in managing data growth and service level fulfillment

The difference in managing thin-provisioned storage compared to traditional storage is that due to dense consolidation of application data accumulated application data growth rates might vary in a broader corridor than they would traditionally

7 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

To control the level of physical resources we outline methods for increasing storage utilization by provisioning storage with NetApp storage efficiency technologies We also outline how to react to organic data growth and how to level the storage use within a corridor of high efficiency by using NetApp technologies for data center flexibility

To summarize this document introduces three phases to manage the storage on NetApp storage controllers provisioning organic growth and mitigation of storage tightness These phases play a vital role for aggregates which are the coarsest storage abstraction of a NetApp storage controller

bull Provisioning phase In this phase storage is provisioned by the NetApp shared storage infrastructure which increases the utilization of aggregates The goal is to operate the aggregates with a high level of data consolidation in an efficient utilization corridor Figure 2 visualizes this corridor

bull Organic growth phase In this phase no further storage is provisioned to slow down growth of aggregate utilization The goal is to operate in a corridor of high utilization but safely reach the next planned downtime or administration window of the served applications Thus the organic growth phase is sized depending on the growth rate of capacity used and the length of the time frame to the planned downtime windows Figure 3 visualizes slowed growth

bull Mitigation of storage tightness phase This phase prevents an uncontrolled level of utilization and provides mitigation activities to lower this level Several mitigation alternatives are presented to mitigate storage tightness and to shift back the aggregate utilization in the desired operational corridor Figure 4 visualizes this mitigation

Figure 2) Storage consolidation and growing utilization using thin provisioning

Data Data Growth

Operational Sweet Spot Corridor

Aggregate Capacity

Figure 3) Controlled and slowed data growth within the operational sweet spot corridor in the aggregate

Data Growth

Aggregate Capacity

Data

8 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 4) Mitigate to prevent uncontrolled utilization

Aggregate Capacity

Data

Mitigate to prevent uncontrolled utilization

This document addresses best practice and tools to manage the NetApp storage infrastructure and to support the decision making during transition between phases

23 AUDIENCE

This document addresses two audiences

bull Decision makers It provides decision makers with an understanding of how to align storage efficiency best practices and processes within their existing operations organization

bull Operational teams It allows operational teams to understand monitoring and management of the storage infrastructure while mastering data growth It allows the operational teams to implement a basic setting and to position their usage goals We refer to existing NetApp best practices to increase the level of data consolidation and to achieve overcommitment for major applications

9 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

24 SCENARIO

As a scenario we consider a service provider who delivers a set of IT services This service provider might serve internal or external customers at different service levels The service levels provided are characterized by unplanned downtime as exemplified by Figure 5 This characterization is useful for aligning service data with physical resources In our example the highest level of service availability is delivered for Platinum services It is further assumed that provided services have different lifetimes and dates of creation

The service providerrsquos major goal is to deliver the services within the specified service level and to achieve maximum data consolidation through NetApp storage efficiency technologies This directly translates into cost savings related to capital investments floor space cooling maintenance and operational expenses However storing data in a consolidated way using storage efficiency technologies needs to take into account the aggregated data growth rates of the applications

Predicting the data growth rates depends on several parameters which are usually outside of control and knowledge of the service provider These parameters include usage characteristics number of users and functionality used To compensate for the deficiencies in precisely predicting the data growth over a specific time frame we propose a scheme that the service provider can use to achieve operational flexibility and adaptability to handle unpredictable growth rates

Figure 5) Sample service levels ordered by service disruption and recovery time

bull Best Effort Servicesbull DevTestbull ColdFillup databull Dynamicshort term

data

Bronzebull Production

Gold

bull Production

Gold

bull Productionbull Low budget

Silver

bull Productionbull Premium

customers

Platinum

Disruption

Recovery Time

Lowest Low Best Effort

Lowest

Low

Best Effort

In this document the focus is on operational aspects of storage efficiency technologies to achieve data center consolidation and agility Thus we take a seat next to the operational staff of our sample service provider to understand their questions regarding the technologies that make up NetApp storage efficiency

We address the questions posed by the operational staff such as How do we set it up How do we run this How do we integrate necessary procedures in our daily life

10 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

A set of questions pertaining to the lifetime of a service instance and its storage arises It starts with provisioning storage in a NetApp shared storage infrastructure detection and monitoring of situations endangering the level of a service necessary response procedures and promoting a continuous and smooth delivery of services

The questions are structured around a cycle that starts at provisioning storage and finishes at deprovisioning storage Figure 6 shows important questions regarding storage efficiency from an operational point of view

bull Provisioning deals with the provisioning of storage In this document provisioning models are shown to achieve a high level of storage consolidation while preserving operational flexibility For individual applications NetApp provides a rich library of technical reports on how to provision best

bull Monitoring deals with defining the goal of the monitoring process and which metrics to use to decide when to stop (for example the provisioning of storage) Relevant metrics provided by NetApp Operations Manager are described

bull Notifying deals with how to notify people in charge of when to perform certain actions The notification mechanisms within NetApp Operations Manager are described to deliver information in case of certain events

bull Mitigation deals with mechanisms to prevent uncontrolled utilization affecting operational flexibility and service fulfillment

Figure 6) Questions regarding storage efficiency from an operational point of view

bull How to provision best for storage efficiencysect Provisioning Modelssect NetApp Data Motion

awarenesssect From scratch or

templateclonebull Where to provision tobull Which SLAbull What are the defaults

Provision

Monitor

Notification

Mitigatebull What is criticalsect When to stop provisioningsect When to stop extendingsect When to relax tightnesssect How to detect

bull Monitoringsect Toolssect What to monitor

bull Who is in charge to reactbull How to notify

bull Available optionsbull Implications on SLAs bull When to act

Before discussing the details of this cycle it is important to understand the NetApp technologies that achieve storage efficiency and flexibility and to understand their relevance in the provisioning and operational phases

11 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY

The NetApp technologies provided in Table 1 are characterized by how they bring the most significant advantage and value For example FlexClonereg technology provides significant time and space advantages while provisioning but the space advantage might be reduced over time In contrast deduplication technology can achieve space savings over the entire storage lifetime

Table 1) NetApp technologies for storage efficiency and flexibility

NetApp Technology Benefit During Provisioning

During Operation

FlexClone Instantly creates thin provisioned and space-efficient writable clones X

FlexVolreg Implements thin provisioning and consumes only the needed space rather than the requested space X X

Deduplication Increases data consolidation by detecting and optimizing repeating patterns in primary and secondary storage X X

NetApp Data Motion Provides flexibility for management and optimal loadcapacity rebalancing in growing cloud environments without downtime

X

Aggregate Extensibility in Data ONTAPreg

Data ONTAP is the foundation for all features listed in this table and provides flexibility in handling physical resources It allows extending physical aggregates during operation

X X

Furthermore NetApp RAID-DPreg SATA and NetApp Flash Cache (formerly PAM II) are technologies that help to reduce the total cost of storage tremendously It is assumed that these technologies are deployed according to the requirements of the use case

NETAPP SHARED STORAGE INFRASTRUCTURE

To implement the practices outlined in this document some prerequisites must be met We assume a NetApp shared storage infrastructure implemented using large aggregates This acts as a utility for delivering storage in a flexible manner for applications with different needs It scales with the demands and serves a variety of different service levels at the same time NetApp Operations Manager monitors the NetApp shared storage infrastructure This software acts as a central management station and consolidates information about the current status of all NetApp storage controllers Based on this information Operations Manager indicates the necessity to change the phases and behavior in the data center

The NetApp shared storage infrastructure provides different ways for clients to consume its resources It can provide a traditional view where storage resources are located at a specific controller Using NetApp Provisioning Manager the infrastructure can also provide a service-oriented resource view that abstracts resource consumption and management from their physical controllers The abstractions of a storage service catalog resource pools and datasets provide easy manageability in the face of massive scale If multi-tenancy is not required then this is the abstraction of choice

Supported by the NetApp technologies MultiStorereg (vFilertrade) and NetApp Data Motion storage can be provided in a utilitylike fashion independent of physical hardware This makes possible high operational flexibility in the data center and allows building virtualized environments for multiple tenants with competing interests

12 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

3 PROVISIONING Some features such as data deduplication can be turned on or off at any time However to achieve the maximum consolidation and storage efficiency more strategies must be considered during the data-provisioning phase

In addition provisioning should take the flexibility of storage (for example the migrate ability) into account This allows administrators to easily move data off aggregates approaching capacity without downtime This is also an important aspect when planning to deliver services 24x7

Thus design goals for storage layouts are storage efficiency and operational flexibility In the following sections we discuss three orthogonal dimensions of provisioning storage Two of them focus on achieving data consolidation The third one focuses on achieving operational flexibility All dimensions can be combined independently Note that the achievable level of consolidation depends on the applications and its data

TR-3827 If You Are Doing This Then Your Storage Could Be Underutilized provides further understanding of storage efficiency and operational flexibility

31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING

This section deals with the way data is provisioned and the consequences for storage efficiency We recommend applying the so-called zero fat configurations

We consider the storage setup for a single application instance The presented configurations can be applied while provisioning storage from scratch and to already provisioned storage When the technical dimensions of storage provisioning are categorized in primary data and its Snapshottrade copies space there are four theoretical combinations for both network-attached storage (NAS) and storage area network (SAN) environments In practical applications only two variants are relevant to NAS and three variants are relevant to SAN storage

bull Full fat bull Low fat bull Zero fat

According to NetApp best practices we do not mix block and file data within a single volume which allows us to consider NAS and SAN environments separately

NAS

For NAS two options are recommended full fat and zero fat

bull Full fat The primary data and Snapshot copy space are preallocated bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves

the best ratio of storage efficiency when provisioning applications from scratch

13 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 7) Provisioning model for NAS storage from scratch Technically only two out of four combinations are possible

Primary Data (Files amp Directory) Space Allocation

Fat Thin

Full Fat Option No Option

No Option Zero Fat Option

Fat

Thin

Snapshot Copy Space

Allocation

Note Full fat is characterized slightly different in NAS and SAN due to their technical properties

FULL FAT PROVISIONING

Full fat provisioning NAS is the traditional (default) way to implement NFSCIFS shares Volumes in a full fat configuration are characterized as follows

bull Volumes are created with space guarantee bull The size of the volume follows the formula X + Δ

X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data

bull Because space used for Snapshot copies might grow unexpectedly the autosize function can be used to make space available when reaching a certain volume threshold This would also happen when the space reserved for user data gets low

bull Space reservation for Snapshot copies is used to hide the capacity used for Snapshot copies from the consumers (NAS clients)

bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Normally using autodelete is not recommended in NAS environments Keeping a certain number of

Snapshot copies for file versioningrestores might be part of the SLAs defined for file services

Note Deleting snapshots may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision

Table 2) Full fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default

autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit

14 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

Volume Snapshot Options

reserve yes Value depends on number of Snapshot copies and change rate within the volume

schedule switched on Automatic Snapshot technology schedules

autodelete off Deleting Snapshot copies is not recommended in most NAS environments

ZERO FAT PROVISIONING

The zero fat method is the most efficient way to provision NAS volumes

bull Volumes are created without space guarantee bull The size of the volume still follows the formula X + Δ

X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data Sizing the volume defines a container with a virtual size for the consumers NAS users are familiar with fixed-sized file shares

bull Space used for Snapshot copies can grow unexpectedly You can use the autosize function to make space available when reaching a certain volume threshold You can also use the autosize function when the space reserved for user data gets low

bull Space reserved for Snapshot copies is used to hide from the consumers (NAS clients) the capacity taken up by Snapshot copies

bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Using autodelete is normally not recommended in NAS environments Keeping a certain amount of

space for Snapshot copies for file versioningrestores is part of the SLAs defined for file services

Note Deleting Snapshot copies may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision

Table 3) Zero fat provisioning

Option Recommended Value Notes

Volume Options

guarantee none

fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default

autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit

15 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first - Autodelete is not recommended in most environments

Volume Snapshot Options

reserve yesno The value depends on the number of Snapshot copies and the change rate within the volume Displaying only the committed usable space using SLA is the preferred way to provision NAS storage However there might be situations in which the Snapshot reserve area is omitted (no)

schedule switched on Automatic Snapshot technology schedules

autodelete off Deleting Snapshot copies is not recommended in most NAS environments

SAN

For SAN we consider three options

bull Full fat Both primary data and its Snapshot copy space are preallocated bull Low fat The primary data is preallocated The Snapshot copy space is allocated on demand bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves

the best ratio of storage efficiency when provisioning applications from scratch

Figure 8) Provisioning model for SAN storage from scratch

Primary Data (LUN) Space Allocation

Fat Thin

Full Fat Option No Option

Low Fat Option Zero Fat Option

Fat

Thin

Snapshot Copy Space

Allocation

FULL FAT PROVISIONING

This method can be treated as the historical way of provisioning block storage with Data ONTAP

bull Volumes are created with space guarantee bull A fractional (overwrite) reserve is used to guarantee that the primary data can be overwritten

completely with Snapshot copies in place If this space is not available Snapshot copy creation will fail

bull The size of the volume follows the formula 2X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data

16 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The enhancements to the volume autosize capabilities (such as volume size-dependent thresholds) and the robustness of the Snapshot feature autodelete implementation have made full fat provisioning more or less obsolete As of today the default settings for creating volumesLUNs in Data ONTAP still apply to these settings See Provisioning from Scratch Full Fat to Zero Fat Provisioning for a discussion of using tools such as Provisioning Manager

Table 4) Full fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 100 Even technically possible a fractional reserve below 100 incorporates a potential risk to run out of Snapshot copy overwrite space This situation should be avoided

autosize off Autosize could be used as an option to create free space needed for Snapshot copy creation

Volume Snapshot Options

reserve 0

schedule switched off

autodelete off

LUN Options

reservation enable

LOW FAT PROVISIONING

With low fat provisioning we use a more space-efficient way to provision volumes

bull Volumes are created with space guarantee bull LUNs are created with space guarantee as well This setup does not benefit from unused blocks with

a LUN (During the lifetime of a LUN the amount of free unused blocks typically decreases Without space reclamation techniques allocated blocks on the storage system stay allocated)

bull The size of the volume follows the formula X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data

bull Because space used for Snapshot copies might grow unexpectedly the autosize and autodelete policies are used to make space available when reaching a preset volume threshold

Table 5) Low fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 0 Snapshot space is controlled by autodelete and autosize options

autosize on Turn autosize on

17 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first volume_grow Increasing the size of the volume does not destroy any data or information There is no reason not to increase the size of the volume It can be reverted afterward if the volume free space increases again There might be configurations where automatic volume growth is not desired

Volume Snapshot Options

reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration of Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)

schedule switched off

autodelete on There might be Snapshot copies that are needed to fulfill certain SLAs such as backup SLAs Setting this policy needs to be negotiated with the business requirements In the worst case scenario deleting Snapshot copies is not an option

autodelete options volume oldest_first There is a precedent for Snapshot copies being a candidate for deletion oldest_first is the current default

LUN Options

reservation enable Reserves space for the LUN during creation

ZERO FAT PROVISIONING

Full and low fat provisioning use fully allocated volumes and LUNs Per-default space allocation happens within the boundaries of the LUN and the volume Zero fat follows a 100 ldquoallocate on demandrdquo concept

bull Volumes are created without space guarantee bull LUNs are created without space guarantee bull The size of the volume follows the formula X ndash N + Δ

X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data N is the amount of unused blocks within a given LUN

Table 6) Zero fat provisioning

Option Recommended Value Notes

Volume Options

guarantee none No space reservation for volume at all

fractional_reserve 0 With Data ONTAP 733 fractional_reserve can be modified even for volumes without a space guarantee of type volume Prior to Data ONTAP 733 the value was fixed at 100

autosize on Turn autosize on

18 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first volume_grow

Volume Snapshot Options

reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)

schedule switched off

autodelete off Deleting Snapshot copies might be an option when the volume can no longer be resized because the maximum configured size has been reached or when the aggregatersquos free space becomes low

LUN Options

reservation disable No preallocation of blocks for LUN

SUMMARY OF PROVISIONING METHODS

There are good reasons for using any of the provisioning methods already described however full fat for SAN environments should be avoided wherever possible because of the storage efficiency ratio Even with a 100 block usage ratio on primary data zero fat provisioning has many advantages and is the preferred method

bull The aggregatersquos free space is a global pool that can serve space for volumes This gives more flexibility than volumes with their own dedicated free space

bull For SAN volumes the block consumption can be easily monitored bull Deduplication savings go directly into the global pool of free space which is the aggregate or the

resource pool in which it belongs bull Monitoring is needed only on the aggregate level Volumes will grow on demand

Table 7) Comparison of provisioning methods

Characteristics Full Fat Low Fat Zero Fat

Space consumption 2X + Δ X + Δ X ndash N + Δ2

Space efficient

No Partially for Snapshot copies

Yes

Monitoring Optional Required on volume and aggregate level

Required on aggregate level

Notificationmitigation process required

No Optional in most cases Yes

2 N is the traditional thin provisioning impact = amount of blocks logically allocated but not used

19 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Characteristics Full Fat Low Fat Zero Fat

Pool benefitting of dedupe savings

Volume fractional reserve area

Volume free space area Aggregate free space area

Risk of an out of space condition on primary data

No No as long as autodelete is able to delete any Snapshot copies

Yes when monitoring and notification processes are missing

Typical use cases Small installations None or few storage management skills (no monitoring infrastructure)

Large database environments

Shared storage infrastructure Testdev environments Storage pools for virtualized servers

FULLLOWZERO FAT PROVISIONING WITH PROVISIONING MANAGER

NetApp Provisioning Manager focuses on fast and efficient provisioning of storage resources in the NetApp storage infrastructure It speeds up provisioning workflows and boosts capacity usage by using policy-based automation for the entire NetApp NAS and SAN infrastructure These processes are

bull Faster than manually provisioning storage bull Easier to maintain than scripts bull Instrumental in minimizing the risk of data loss resulting from misconfigured storage

Provisioning Manager applies user-defined policies to consistently select the appropriate resources for each provisioning activity This frees administrators from the headache of searching for available space to provision and allows more time for strategic issues The use of a centralized management console allows administrators to monitor the status of their provisioned storage resources Provisioning Manager can help improve business agility and capacity use shrink provisioning time and improve administrator productivity Provisioning Managerrsquos thin provisioning and deduplication capabilities provide a high level of storage efficiency from your NetApp storage investment

A GUI allows you to implement the provisioning models fulllowzero fat within Provisioning Manager See Figure 9 and Figure 10 for configuring storage efficiency in a provisioning policy for NAS and SAN Whenever storage is provisioned using this provisioning policy the settings apply automatically For more information refer to TR-3710 Operations Manager Provisioning Manager and Protection Manager Best Practices Guide

Provisioning Manager encapsulates technical details when provisioning storage and supports an easy integration with existing management tools and orchestration frameworks Policies and their use in so-called datasets and storage services allow you to exploit NetApp storage efficiency technologies without exposing a high level of technical detail to a higher level of management software

Note Provisioning Manager up to version 40 does not allow you to specify autosize and autodelete individually the policy template determines if these features are used and which options are selected In order to implement the provisioning methods outlined a customized ldquoProvisioning Scriptrdquo needs to be provided to set autosize and autodelete parameters according to the recommendations for FullLowZero methods Post provisioning scripts are standard with Provision Manager Use caution when Provisioning Manager runs conformance checks this reverts individual settings

20 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete

21 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat

FULLLOWZERO FAT PROVISIONING FOR STORAGE SERVICES

Storage services are an easy abstraction to provision storage in a utilitylike fashion A storage service describes all characteristic attributes for storage needed in a certain scenario A storage service catalog lists the available templates and allows you to provision storage with these attributes on demand Technically storage services or datasets consist of one or more resource pools a protection policy and a provisioning policy Fulllowzero fat provisioning for storage services is configured in the configuration wizard of the provisioning policy Figure 11 shows the provisioning policies closest to fulllowzero fat configurations

Figure 11) Fulllowzero fat provisioning policies for datasets and storage services

Because this wizard is able to configure the deduplication feature two policies are configured for the zero fat configurations one with deduplication and one without deduplication

22 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

HOW SHOULD A VOLUME BE SIZED

Because physical allocation of data within a zero fat-provisioned volume is done on demand theoretically the volume size can be set to a very high value that can easily keep all application data and Snapshot copies As the unallocated space in the volume is not exclusively reserved for this volume itself all other applications can benefit from the shared pool of unallocated storage However NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand The important advantage is that the commitment rate acts as a metric for data consolidation

Note The commitment rate reflects the amount of logical data consolidation This metric is suitable for deciding when data should be left for organic growth

Additionally the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers

APPLICATION RECOMMENDATIONS

Thin provisioning is most effective when applications use data that is committed to them step by step When applications preformat data the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks Because thin provisioning has no performance penalty the general recommendation is to provision with the zero fat configuration

For SAN-attached storage NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed On Windowsreg platforms this can be configured in NetApp SnapDrivereg

For Oraclereg database best practices refer to WP-7084 Storage Efficiency in an Oracle Environment

32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS

This section deals with provisioning storage for similar applications from a golden template A valid use case is a hosting provider who offers and serves predefined application services in mass quantities Instead of provisioning each application from scratch the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure

When applications are provisioned this way NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space This achieves a high degree of data consolidation and cost savings

The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance CPU and memory

There are two ways to align application data to a NetApp shared storage infrastructure

bull Volume-centric storage layout bull Dedupe-centric storage layout

Depending on the data lifetime suitability for deduplication consistency and tool constraints one way of aligning application data is more appropriate than the other

In both variants the storage of the application template can be provisioned as either full low or zero fat The cloning procedure inherits the attributes of the parent volume To create space-efficient clones the space guarantee must be set to none

23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

CONSEQUENCES FOR MONITORING

When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well

VOLUME-CENTRIC STORAGE LAYOUT

In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies

In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions

bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time

bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data

A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN

Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate

Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases

bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts

24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance 1

Deduplication Block Sharing FlexVol

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance n

FlexClone Block Sharing

Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow

Best Practice

A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance

Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided

bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning

bull Preformatting data

25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

DEDUPE-CENTRIC STORAGE LAYOUT

In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication

This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity

This storage layout provides the following storage efficiency advantages in a short- and long-term perspective

bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns

bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data

In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation

TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment

Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates

26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically

Template

Instance 1

Instance 2

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled

Best Practice

This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication

Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume

We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties

27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION

Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach

The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left

It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to

bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a

limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the

migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval

Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate

Settled

Aggregate

Nomad Nomad

To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor

28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

SLA-BASED ASSESSMENT FOR SETTLEDNOMAD

The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances

We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment

Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances

Figure 15) Alignment by technical impact (sorted by negative impact in descending order)

Settled Nomad

Eg All FC

Instance Inst1 InstN

Medium LowInside SLANeg Impact HighOutside SLA

Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest

Figure 16) Alignment by business impact (sorted by negative impact in descending order)

Settled Nomad

Instance

Neg Impact $$ $

Semi-Settled Nomad

PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION

Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers

29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion

NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations

Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility

ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE

While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted

SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING

In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting

ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS

Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore

In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller

30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise

We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives

Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases

bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously

provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on

These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future

SITUATIONS PUTTING SLA FULFILLMENT AT RISK

Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy

The following list summarize situations that are critical for service delivery

bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time

bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use

bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react

bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this

looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a

ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk

Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are

contained

31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups

41 PHASES AND TRANSITIONS

This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness

bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase

bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase

bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase

42 MONITORING

NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors

Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set

THRESHOLDS

Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation

Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page

32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 17) Operations Manager screen to configure thresholds on operational metrics

For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation

Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data

The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations

bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge

bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification

bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor

bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification

Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state

bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached

bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification

bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality

33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

TRENDING

Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly

Figure 18) Trending of data growth and days-to-full prediction in Operations Manager

Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting

The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates

On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary

Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object

34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME

For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog

MONITORING STORAGE EFFICIENCY RETURNS

NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard

Figure 19) Storage efficiency dashboard in Operations Manager

35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

43 NOTIFICATION

Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low

After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process

Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure

Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP

NOTIFY BY E-MAIL

An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42

NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager

NOTIFY BY SNMP

Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested

36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 20) Configuring an alarm based on the threshold aggregate almost full

Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there

NOTIFY BY SCRIPT

Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line

dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full

37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

44 MITIGATE STORAGE USE

Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor

Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary

When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness

MITIGATION ACTIVITIES FOR AGGREGATES

Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow

1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives

2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero

3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space

4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage

controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability

6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes

7 If none of the listed activities can be used the application must be stopped to achieve a consistent state

The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online

38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Table 8) Mitigation alternatives to control use within aggregates

No Mitigation Activity Repeatability SLA Impact Preparation Time

Time to Show Effect

1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits

None HW procurement

Immediate (+rebalancing)

Data ONTAP 8 High limits

2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate

3 Shrink other volumes in the aggregate if they have enough free space

One time Low None Immediate

4 Run deduplication and shrink volumes Repeatable Low

Time to execute dedupe

Immediate

5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time

6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh

Next planned downtime window

Minutes Volume switch-over time

7 Prevent application data loss and stop the application then migrate (offline)

Repeatable Lowndashhigh Coordinate with app owner

Minutes Migration time

MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS

Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity

Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)

Table 9) Mitigation activities for resource tightness within volumes

No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect

1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate

2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate

3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate

4 Activate FAS deduplication for the volume (requires proper space guarantees) One time

Lowpossible performance impact

Wait for schedule Hours

5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate

Repeatable High Next planned downtime window

Minutes Volume migration time

6 Stop application and migrate data Repeatable High Coordinate wapp owner

Minutes Migration time

39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations

The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum

51 SAMPLE SETTING 1 REAL-LIFE SETTING

This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative

Figure 21) Storage to enable organic data growth between planned downtime windows

Data Data Growth

Planned Downtime Window

Planned Downtime Window

Months Time

Note Several months might fall between planned downtime windows to perform major mitigation alternatives

The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely

Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate

40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth

An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are

bull First metric Aggregate capacity used bull Second metric Aggregate space committed

Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space

Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space

Data Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used

Aggregate Space Committed

0ndash50 gt 65

0ndash110 gt 120

Provisioning New Storage Y

Capacity Assessment Adapt Thresholds

Mitigate

Y Y

Y

Provisioning New Storage Y

Assess Capacity Y

41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

52 SAMPLE SETTING 2 SETTLEDNOMAD

This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online

Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months

Settled

Detecting the Need to Act

Effect of Mitigation (eg migration)

Hours Time

N NN N N

In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on

the individual situation and is calculated against 100

The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases

Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative

Detection Threshold Notify Mitigation

gt 70 Storage operations Stop provisioning of storage

gt 85 Storage operations Stop extending provisioned storage

gt 90 Storage operations Relax resource situation and migrate nomad

42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used

Settled Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used 0ndash70 70ndash85 gt 90

Provisioning New Storage Y

Extending Already Provisioned Storage

Relax UtilizationmdashNetApp Data Motion a Nomad

Y Y

Y

N N N

You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors

43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe

Elapsed Time

Capacity

1 Month 3 Months

Committed Capacity

Overall Trend

Last 3-Month Trend

Capacity Used

1 2 3

As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows

1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window

2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment

3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)

44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not

exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect

b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives

c Determine the growth rate Operations Manager provides help in determining the trend of data growth

d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past

To provision storage following these steps

1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes

2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler

entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there

is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller

c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely

d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off

Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 5: Lun Provision

5 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

2 INTRODUCTION Exponential data growth generates a serious challenge for IT managers Gartner predicts that within the period from 2008 to 2013 enterprises will purchase and install 20 times more terabytes of storage than they did in 2008 (wwwgartnercomtechnologymedia-productsnewslettersnetappissue24gartner3html) Until recently continuous improvements in cost for performance and storage space have made it easy and affordable to solve storage concerns by adding more disks to existing storage systems However IT executives are discovering that there are limits to that solution floor space weight loads rack space network drops power connections cooling infrastructure and even power itself are finite resources Hitting any one of these limits significantly jeopardizes the ability of IT to meet business demands

NetApprsquos solution to rapid resource consumption is to reduce storage controllers and disks from the resource equation by using storage more efficiently Key benefits of this strategy are

bull Less management involvement bull Reduced complexity support and service costs bull Improved performance and network efficiency

NetApp storage efficiency technologies are key to achieving data consolidation and managing future data growth they allow for storing and managing several times more data on NetApp storage controllers than would fit on their physical attached disks and allow the deferral of IT investments to the future

In this document we describe techniques and guidelines you can use to find the operational sweet spot for NetApp storage efficiency technologies in your environment By adhering to the best practices outlined you can increase storage consolidation and agility as well as decrease operational risk

The document is organized as follows

bull Chapter 3 describes storage provisioning bull Chapter 4 describes the monitoring process and supporting tools for daily operation bull Chapter 5 describes concrete operational setups used in daily life bull Chapter 6 concludes with a step-by-step cookbook to provision and manage storage efficiently and to

adapt individual thresholds

21 TERMINOLOGY

We use the following terminology to describe resource use on the level of exposing storage to applications and on the level of physical resource allocation within the aggregates in the storage controllers Also refer to the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further explanation of this terminology

bull Logical storage refers to storage that is visible at the application layer Logical storage does not necessarily require the allocation of usable capacity

bull Usable capacity refers to storage that is usable for the applications provided by NetApp storage controllers

bull Used capacity is a value that represents the amount of physical capacity that holds application or user data In Operations Manager1

bull Storage utilization refers to the ratio of usable capacity to used capacity without accounting for efficiency returns

terminology this is represented by capacity used

1 NetApp Operations Manager is a central console that delivers comprehensive monitoring and management for NetApp storage with alerts reports performance and configuration tools

6 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

bull Commitment rate is an Operations Manager term that refers to the percentage of aggregate space committed to volumes

bull Deduplication rate is an Operations Manager term that measures the efficiency of the deduplication functionality This rate is measured for the volume and the aggregate level in percentage metric

For the aggregate we define different operational windows characterized by an interval of storage utilization We define an interval as the operational sweet spot corridor (green) where the aggregate should be operated for optimal utilization and service availability We define a tolerance interval (yellow) where actions are taken to get back into the operational sweet spot window We define an interval as a no-go area (red) where we do not intend to operate the aggregate This area might act as a last buffer of time or can be considered an area where operational staff has less experience

Figure 1 explains the terminology in the context of storage objects on a NetApp storage controller The aggregate is a physically limited storage object Aggregates are treated as fairly static containers and thus need proper size management

Figure 1) Terminology in context of the storage objects of volumes and aggregates

Committed Logical Storage

Usable Capacity of Aggregate

Data

Data Growth

Volumes with LUNSNAS

Operational Sweet Spot Corridor

Used Capacity

In practice commitment rates far above 100 are common in customer environments This document describes how to manage this

22 GOAL OF THIS DOCUMENT

The goal of this document is to achieve best-in-class storage efficiency and costs by consolidating the highest possible amount of data of applications while meeting the required service-level agreements The idea is to enable thin provisioning while controlling the use of physical resources in the NetApp shared storage infrastructure within a desired corridor NetApp storage efficiency technologies can save a significant amount of the IT budget On the other side running at an uncontrolled use level can reduce flexibility and cause headaches in managing data growth and service level fulfillment

The difference in managing thin-provisioned storage compared to traditional storage is that due to dense consolidation of application data accumulated application data growth rates might vary in a broader corridor than they would traditionally

7 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

To control the level of physical resources we outline methods for increasing storage utilization by provisioning storage with NetApp storage efficiency technologies We also outline how to react to organic data growth and how to level the storage use within a corridor of high efficiency by using NetApp technologies for data center flexibility

To summarize this document introduces three phases to manage the storage on NetApp storage controllers provisioning organic growth and mitigation of storage tightness These phases play a vital role for aggregates which are the coarsest storage abstraction of a NetApp storage controller

bull Provisioning phase In this phase storage is provisioned by the NetApp shared storage infrastructure which increases the utilization of aggregates The goal is to operate the aggregates with a high level of data consolidation in an efficient utilization corridor Figure 2 visualizes this corridor

bull Organic growth phase In this phase no further storage is provisioned to slow down growth of aggregate utilization The goal is to operate in a corridor of high utilization but safely reach the next planned downtime or administration window of the served applications Thus the organic growth phase is sized depending on the growth rate of capacity used and the length of the time frame to the planned downtime windows Figure 3 visualizes slowed growth

bull Mitigation of storage tightness phase This phase prevents an uncontrolled level of utilization and provides mitigation activities to lower this level Several mitigation alternatives are presented to mitigate storage tightness and to shift back the aggregate utilization in the desired operational corridor Figure 4 visualizes this mitigation

Figure 2) Storage consolidation and growing utilization using thin provisioning

Data Data Growth

Operational Sweet Spot Corridor

Aggregate Capacity

Figure 3) Controlled and slowed data growth within the operational sweet spot corridor in the aggregate

Data Growth

Aggregate Capacity

Data

8 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 4) Mitigate to prevent uncontrolled utilization

Aggregate Capacity

Data

Mitigate to prevent uncontrolled utilization

This document addresses best practice and tools to manage the NetApp storage infrastructure and to support the decision making during transition between phases

23 AUDIENCE

This document addresses two audiences

bull Decision makers It provides decision makers with an understanding of how to align storage efficiency best practices and processes within their existing operations organization

bull Operational teams It allows operational teams to understand monitoring and management of the storage infrastructure while mastering data growth It allows the operational teams to implement a basic setting and to position their usage goals We refer to existing NetApp best practices to increase the level of data consolidation and to achieve overcommitment for major applications

9 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

24 SCENARIO

As a scenario we consider a service provider who delivers a set of IT services This service provider might serve internal or external customers at different service levels The service levels provided are characterized by unplanned downtime as exemplified by Figure 5 This characterization is useful for aligning service data with physical resources In our example the highest level of service availability is delivered for Platinum services It is further assumed that provided services have different lifetimes and dates of creation

The service providerrsquos major goal is to deliver the services within the specified service level and to achieve maximum data consolidation through NetApp storage efficiency technologies This directly translates into cost savings related to capital investments floor space cooling maintenance and operational expenses However storing data in a consolidated way using storage efficiency technologies needs to take into account the aggregated data growth rates of the applications

Predicting the data growth rates depends on several parameters which are usually outside of control and knowledge of the service provider These parameters include usage characteristics number of users and functionality used To compensate for the deficiencies in precisely predicting the data growth over a specific time frame we propose a scheme that the service provider can use to achieve operational flexibility and adaptability to handle unpredictable growth rates

Figure 5) Sample service levels ordered by service disruption and recovery time

bull Best Effort Servicesbull DevTestbull ColdFillup databull Dynamicshort term

data

Bronzebull Production

Gold

bull Production

Gold

bull Productionbull Low budget

Silver

bull Productionbull Premium

customers

Platinum

Disruption

Recovery Time

Lowest Low Best Effort

Lowest

Low

Best Effort

In this document the focus is on operational aspects of storage efficiency technologies to achieve data center consolidation and agility Thus we take a seat next to the operational staff of our sample service provider to understand their questions regarding the technologies that make up NetApp storage efficiency

We address the questions posed by the operational staff such as How do we set it up How do we run this How do we integrate necessary procedures in our daily life

10 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

A set of questions pertaining to the lifetime of a service instance and its storage arises It starts with provisioning storage in a NetApp shared storage infrastructure detection and monitoring of situations endangering the level of a service necessary response procedures and promoting a continuous and smooth delivery of services

The questions are structured around a cycle that starts at provisioning storage and finishes at deprovisioning storage Figure 6 shows important questions regarding storage efficiency from an operational point of view

bull Provisioning deals with the provisioning of storage In this document provisioning models are shown to achieve a high level of storage consolidation while preserving operational flexibility For individual applications NetApp provides a rich library of technical reports on how to provision best

bull Monitoring deals with defining the goal of the monitoring process and which metrics to use to decide when to stop (for example the provisioning of storage) Relevant metrics provided by NetApp Operations Manager are described

bull Notifying deals with how to notify people in charge of when to perform certain actions The notification mechanisms within NetApp Operations Manager are described to deliver information in case of certain events

bull Mitigation deals with mechanisms to prevent uncontrolled utilization affecting operational flexibility and service fulfillment

Figure 6) Questions regarding storage efficiency from an operational point of view

bull How to provision best for storage efficiencysect Provisioning Modelssect NetApp Data Motion

awarenesssect From scratch or

templateclonebull Where to provision tobull Which SLAbull What are the defaults

Provision

Monitor

Notification

Mitigatebull What is criticalsect When to stop provisioningsect When to stop extendingsect When to relax tightnesssect How to detect

bull Monitoringsect Toolssect What to monitor

bull Who is in charge to reactbull How to notify

bull Available optionsbull Implications on SLAs bull When to act

Before discussing the details of this cycle it is important to understand the NetApp technologies that achieve storage efficiency and flexibility and to understand their relevance in the provisioning and operational phases

11 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY

The NetApp technologies provided in Table 1 are characterized by how they bring the most significant advantage and value For example FlexClonereg technology provides significant time and space advantages while provisioning but the space advantage might be reduced over time In contrast deduplication technology can achieve space savings over the entire storage lifetime

Table 1) NetApp technologies for storage efficiency and flexibility

NetApp Technology Benefit During Provisioning

During Operation

FlexClone Instantly creates thin provisioned and space-efficient writable clones X

FlexVolreg Implements thin provisioning and consumes only the needed space rather than the requested space X X

Deduplication Increases data consolidation by detecting and optimizing repeating patterns in primary and secondary storage X X

NetApp Data Motion Provides flexibility for management and optimal loadcapacity rebalancing in growing cloud environments without downtime

X

Aggregate Extensibility in Data ONTAPreg

Data ONTAP is the foundation for all features listed in this table and provides flexibility in handling physical resources It allows extending physical aggregates during operation

X X

Furthermore NetApp RAID-DPreg SATA and NetApp Flash Cache (formerly PAM II) are technologies that help to reduce the total cost of storage tremendously It is assumed that these technologies are deployed according to the requirements of the use case

NETAPP SHARED STORAGE INFRASTRUCTURE

To implement the practices outlined in this document some prerequisites must be met We assume a NetApp shared storage infrastructure implemented using large aggregates This acts as a utility for delivering storage in a flexible manner for applications with different needs It scales with the demands and serves a variety of different service levels at the same time NetApp Operations Manager monitors the NetApp shared storage infrastructure This software acts as a central management station and consolidates information about the current status of all NetApp storage controllers Based on this information Operations Manager indicates the necessity to change the phases and behavior in the data center

The NetApp shared storage infrastructure provides different ways for clients to consume its resources It can provide a traditional view where storage resources are located at a specific controller Using NetApp Provisioning Manager the infrastructure can also provide a service-oriented resource view that abstracts resource consumption and management from their physical controllers The abstractions of a storage service catalog resource pools and datasets provide easy manageability in the face of massive scale If multi-tenancy is not required then this is the abstraction of choice

Supported by the NetApp technologies MultiStorereg (vFilertrade) and NetApp Data Motion storage can be provided in a utilitylike fashion independent of physical hardware This makes possible high operational flexibility in the data center and allows building virtualized environments for multiple tenants with competing interests

12 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

3 PROVISIONING Some features such as data deduplication can be turned on or off at any time However to achieve the maximum consolidation and storage efficiency more strategies must be considered during the data-provisioning phase

In addition provisioning should take the flexibility of storage (for example the migrate ability) into account This allows administrators to easily move data off aggregates approaching capacity without downtime This is also an important aspect when planning to deliver services 24x7

Thus design goals for storage layouts are storage efficiency and operational flexibility In the following sections we discuss three orthogonal dimensions of provisioning storage Two of them focus on achieving data consolidation The third one focuses on achieving operational flexibility All dimensions can be combined independently Note that the achievable level of consolidation depends on the applications and its data

TR-3827 If You Are Doing This Then Your Storage Could Be Underutilized provides further understanding of storage efficiency and operational flexibility

31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING

This section deals with the way data is provisioned and the consequences for storage efficiency We recommend applying the so-called zero fat configurations

We consider the storage setup for a single application instance The presented configurations can be applied while provisioning storage from scratch and to already provisioned storage When the technical dimensions of storage provisioning are categorized in primary data and its Snapshottrade copies space there are four theoretical combinations for both network-attached storage (NAS) and storage area network (SAN) environments In practical applications only two variants are relevant to NAS and three variants are relevant to SAN storage

bull Full fat bull Low fat bull Zero fat

According to NetApp best practices we do not mix block and file data within a single volume which allows us to consider NAS and SAN environments separately

NAS

For NAS two options are recommended full fat and zero fat

bull Full fat The primary data and Snapshot copy space are preallocated bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves

the best ratio of storage efficiency when provisioning applications from scratch

13 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 7) Provisioning model for NAS storage from scratch Technically only two out of four combinations are possible

Primary Data (Files amp Directory) Space Allocation

Fat Thin

Full Fat Option No Option

No Option Zero Fat Option

Fat

Thin

Snapshot Copy Space

Allocation

Note Full fat is characterized slightly different in NAS and SAN due to their technical properties

FULL FAT PROVISIONING

Full fat provisioning NAS is the traditional (default) way to implement NFSCIFS shares Volumes in a full fat configuration are characterized as follows

bull Volumes are created with space guarantee bull The size of the volume follows the formula X + Δ

X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data

bull Because space used for Snapshot copies might grow unexpectedly the autosize function can be used to make space available when reaching a certain volume threshold This would also happen when the space reserved for user data gets low

bull Space reservation for Snapshot copies is used to hide the capacity used for Snapshot copies from the consumers (NAS clients)

bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Normally using autodelete is not recommended in NAS environments Keeping a certain number of

Snapshot copies for file versioningrestores might be part of the SLAs defined for file services

Note Deleting snapshots may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision

Table 2) Full fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default

autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit

14 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

Volume Snapshot Options

reserve yes Value depends on number of Snapshot copies and change rate within the volume

schedule switched on Automatic Snapshot technology schedules

autodelete off Deleting Snapshot copies is not recommended in most NAS environments

ZERO FAT PROVISIONING

The zero fat method is the most efficient way to provision NAS volumes

bull Volumes are created without space guarantee bull The size of the volume still follows the formula X + Δ

X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data Sizing the volume defines a container with a virtual size for the consumers NAS users are familiar with fixed-sized file shares

bull Space used for Snapshot copies can grow unexpectedly You can use the autosize function to make space available when reaching a certain volume threshold You can also use the autosize function when the space reserved for user data gets low

bull Space reserved for Snapshot copies is used to hide from the consumers (NAS clients) the capacity taken up by Snapshot copies

bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Using autodelete is normally not recommended in NAS environments Keeping a certain amount of

space for Snapshot copies for file versioningrestores is part of the SLAs defined for file services

Note Deleting Snapshot copies may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision

Table 3) Zero fat provisioning

Option Recommended Value Notes

Volume Options

guarantee none

fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default

autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit

15 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first - Autodelete is not recommended in most environments

Volume Snapshot Options

reserve yesno The value depends on the number of Snapshot copies and the change rate within the volume Displaying only the committed usable space using SLA is the preferred way to provision NAS storage However there might be situations in which the Snapshot reserve area is omitted (no)

schedule switched on Automatic Snapshot technology schedules

autodelete off Deleting Snapshot copies is not recommended in most NAS environments

SAN

For SAN we consider three options

bull Full fat Both primary data and its Snapshot copy space are preallocated bull Low fat The primary data is preallocated The Snapshot copy space is allocated on demand bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves

the best ratio of storage efficiency when provisioning applications from scratch

Figure 8) Provisioning model for SAN storage from scratch

Primary Data (LUN) Space Allocation

Fat Thin

Full Fat Option No Option

Low Fat Option Zero Fat Option

Fat

Thin

Snapshot Copy Space

Allocation

FULL FAT PROVISIONING

This method can be treated as the historical way of provisioning block storage with Data ONTAP

bull Volumes are created with space guarantee bull A fractional (overwrite) reserve is used to guarantee that the primary data can be overwritten

completely with Snapshot copies in place If this space is not available Snapshot copy creation will fail

bull The size of the volume follows the formula 2X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data

16 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The enhancements to the volume autosize capabilities (such as volume size-dependent thresholds) and the robustness of the Snapshot feature autodelete implementation have made full fat provisioning more or less obsolete As of today the default settings for creating volumesLUNs in Data ONTAP still apply to these settings See Provisioning from Scratch Full Fat to Zero Fat Provisioning for a discussion of using tools such as Provisioning Manager

Table 4) Full fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 100 Even technically possible a fractional reserve below 100 incorporates a potential risk to run out of Snapshot copy overwrite space This situation should be avoided

autosize off Autosize could be used as an option to create free space needed for Snapshot copy creation

Volume Snapshot Options

reserve 0

schedule switched off

autodelete off

LUN Options

reservation enable

LOW FAT PROVISIONING

With low fat provisioning we use a more space-efficient way to provision volumes

bull Volumes are created with space guarantee bull LUNs are created with space guarantee as well This setup does not benefit from unused blocks with

a LUN (During the lifetime of a LUN the amount of free unused blocks typically decreases Without space reclamation techniques allocated blocks on the storage system stay allocated)

bull The size of the volume follows the formula X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data

bull Because space used for Snapshot copies might grow unexpectedly the autosize and autodelete policies are used to make space available when reaching a preset volume threshold

Table 5) Low fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 0 Snapshot space is controlled by autodelete and autosize options

autosize on Turn autosize on

17 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first volume_grow Increasing the size of the volume does not destroy any data or information There is no reason not to increase the size of the volume It can be reverted afterward if the volume free space increases again There might be configurations where automatic volume growth is not desired

Volume Snapshot Options

reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration of Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)

schedule switched off

autodelete on There might be Snapshot copies that are needed to fulfill certain SLAs such as backup SLAs Setting this policy needs to be negotiated with the business requirements In the worst case scenario deleting Snapshot copies is not an option

autodelete options volume oldest_first There is a precedent for Snapshot copies being a candidate for deletion oldest_first is the current default

LUN Options

reservation enable Reserves space for the LUN during creation

ZERO FAT PROVISIONING

Full and low fat provisioning use fully allocated volumes and LUNs Per-default space allocation happens within the boundaries of the LUN and the volume Zero fat follows a 100 ldquoallocate on demandrdquo concept

bull Volumes are created without space guarantee bull LUNs are created without space guarantee bull The size of the volume follows the formula X ndash N + Δ

X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data N is the amount of unused blocks within a given LUN

Table 6) Zero fat provisioning

Option Recommended Value Notes

Volume Options

guarantee none No space reservation for volume at all

fractional_reserve 0 With Data ONTAP 733 fractional_reserve can be modified even for volumes without a space guarantee of type volume Prior to Data ONTAP 733 the value was fixed at 100

autosize on Turn autosize on

18 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first volume_grow

Volume Snapshot Options

reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)

schedule switched off

autodelete off Deleting Snapshot copies might be an option when the volume can no longer be resized because the maximum configured size has been reached or when the aggregatersquos free space becomes low

LUN Options

reservation disable No preallocation of blocks for LUN

SUMMARY OF PROVISIONING METHODS

There are good reasons for using any of the provisioning methods already described however full fat for SAN environments should be avoided wherever possible because of the storage efficiency ratio Even with a 100 block usage ratio on primary data zero fat provisioning has many advantages and is the preferred method

bull The aggregatersquos free space is a global pool that can serve space for volumes This gives more flexibility than volumes with their own dedicated free space

bull For SAN volumes the block consumption can be easily monitored bull Deduplication savings go directly into the global pool of free space which is the aggregate or the

resource pool in which it belongs bull Monitoring is needed only on the aggregate level Volumes will grow on demand

Table 7) Comparison of provisioning methods

Characteristics Full Fat Low Fat Zero Fat

Space consumption 2X + Δ X + Δ X ndash N + Δ2

Space efficient

No Partially for Snapshot copies

Yes

Monitoring Optional Required on volume and aggregate level

Required on aggregate level

Notificationmitigation process required

No Optional in most cases Yes

2 N is the traditional thin provisioning impact = amount of blocks logically allocated but not used

19 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Characteristics Full Fat Low Fat Zero Fat

Pool benefitting of dedupe savings

Volume fractional reserve area

Volume free space area Aggregate free space area

Risk of an out of space condition on primary data

No No as long as autodelete is able to delete any Snapshot copies

Yes when monitoring and notification processes are missing

Typical use cases Small installations None or few storage management skills (no monitoring infrastructure)

Large database environments

Shared storage infrastructure Testdev environments Storage pools for virtualized servers

FULLLOWZERO FAT PROVISIONING WITH PROVISIONING MANAGER

NetApp Provisioning Manager focuses on fast and efficient provisioning of storage resources in the NetApp storage infrastructure It speeds up provisioning workflows and boosts capacity usage by using policy-based automation for the entire NetApp NAS and SAN infrastructure These processes are

bull Faster than manually provisioning storage bull Easier to maintain than scripts bull Instrumental in minimizing the risk of data loss resulting from misconfigured storage

Provisioning Manager applies user-defined policies to consistently select the appropriate resources for each provisioning activity This frees administrators from the headache of searching for available space to provision and allows more time for strategic issues The use of a centralized management console allows administrators to monitor the status of their provisioned storage resources Provisioning Manager can help improve business agility and capacity use shrink provisioning time and improve administrator productivity Provisioning Managerrsquos thin provisioning and deduplication capabilities provide a high level of storage efficiency from your NetApp storage investment

A GUI allows you to implement the provisioning models fulllowzero fat within Provisioning Manager See Figure 9 and Figure 10 for configuring storage efficiency in a provisioning policy for NAS and SAN Whenever storage is provisioned using this provisioning policy the settings apply automatically For more information refer to TR-3710 Operations Manager Provisioning Manager and Protection Manager Best Practices Guide

Provisioning Manager encapsulates technical details when provisioning storage and supports an easy integration with existing management tools and orchestration frameworks Policies and their use in so-called datasets and storage services allow you to exploit NetApp storage efficiency technologies without exposing a high level of technical detail to a higher level of management software

Note Provisioning Manager up to version 40 does not allow you to specify autosize and autodelete individually the policy template determines if these features are used and which options are selected In order to implement the provisioning methods outlined a customized ldquoProvisioning Scriptrdquo needs to be provided to set autosize and autodelete parameters according to the recommendations for FullLowZero methods Post provisioning scripts are standard with Provision Manager Use caution when Provisioning Manager runs conformance checks this reverts individual settings

20 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete

21 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat

FULLLOWZERO FAT PROVISIONING FOR STORAGE SERVICES

Storage services are an easy abstraction to provision storage in a utilitylike fashion A storage service describes all characteristic attributes for storage needed in a certain scenario A storage service catalog lists the available templates and allows you to provision storage with these attributes on demand Technically storage services or datasets consist of one or more resource pools a protection policy and a provisioning policy Fulllowzero fat provisioning for storage services is configured in the configuration wizard of the provisioning policy Figure 11 shows the provisioning policies closest to fulllowzero fat configurations

Figure 11) Fulllowzero fat provisioning policies for datasets and storage services

Because this wizard is able to configure the deduplication feature two policies are configured for the zero fat configurations one with deduplication and one without deduplication

22 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

HOW SHOULD A VOLUME BE SIZED

Because physical allocation of data within a zero fat-provisioned volume is done on demand theoretically the volume size can be set to a very high value that can easily keep all application data and Snapshot copies As the unallocated space in the volume is not exclusively reserved for this volume itself all other applications can benefit from the shared pool of unallocated storage However NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand The important advantage is that the commitment rate acts as a metric for data consolidation

Note The commitment rate reflects the amount of logical data consolidation This metric is suitable for deciding when data should be left for organic growth

Additionally the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers

APPLICATION RECOMMENDATIONS

Thin provisioning is most effective when applications use data that is committed to them step by step When applications preformat data the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks Because thin provisioning has no performance penalty the general recommendation is to provision with the zero fat configuration

For SAN-attached storage NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed On Windowsreg platforms this can be configured in NetApp SnapDrivereg

For Oraclereg database best practices refer to WP-7084 Storage Efficiency in an Oracle Environment

32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS

This section deals with provisioning storage for similar applications from a golden template A valid use case is a hosting provider who offers and serves predefined application services in mass quantities Instead of provisioning each application from scratch the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure

When applications are provisioned this way NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space This achieves a high degree of data consolidation and cost savings

The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance CPU and memory

There are two ways to align application data to a NetApp shared storage infrastructure

bull Volume-centric storage layout bull Dedupe-centric storage layout

Depending on the data lifetime suitability for deduplication consistency and tool constraints one way of aligning application data is more appropriate than the other

In both variants the storage of the application template can be provisioned as either full low or zero fat The cloning procedure inherits the attributes of the parent volume To create space-efficient clones the space guarantee must be set to none

23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

CONSEQUENCES FOR MONITORING

When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well

VOLUME-CENTRIC STORAGE LAYOUT

In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies

In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions

bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time

bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data

A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN

Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate

Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases

bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts

24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance 1

Deduplication Block Sharing FlexVol

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance n

FlexClone Block Sharing

Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow

Best Practice

A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance

Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided

bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning

bull Preformatting data

25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

DEDUPE-CENTRIC STORAGE LAYOUT

In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication

This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity

This storage layout provides the following storage efficiency advantages in a short- and long-term perspective

bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns

bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data

In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation

TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment

Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates

26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically

Template

Instance 1

Instance 2

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled

Best Practice

This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication

Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume

We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties

27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION

Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach

The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left

It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to

bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a

limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the

migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval

Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate

Settled

Aggregate

Nomad Nomad

To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor

28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

SLA-BASED ASSESSMENT FOR SETTLEDNOMAD

The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances

We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment

Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances

Figure 15) Alignment by technical impact (sorted by negative impact in descending order)

Settled Nomad

Eg All FC

Instance Inst1 InstN

Medium LowInside SLANeg Impact HighOutside SLA

Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest

Figure 16) Alignment by business impact (sorted by negative impact in descending order)

Settled Nomad

Instance

Neg Impact $$ $

Semi-Settled Nomad

PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION

Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers

29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion

NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations

Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility

ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE

While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted

SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING

In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting

ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS

Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore

In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller

30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise

We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives

Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases

bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously

provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on

These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future

SITUATIONS PUTTING SLA FULFILLMENT AT RISK

Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy

The following list summarize situations that are critical for service delivery

bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time

bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use

bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react

bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this

looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a

ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk

Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are

contained

31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups

41 PHASES AND TRANSITIONS

This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness

bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase

bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase

bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase

42 MONITORING

NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors

Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set

THRESHOLDS

Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation

Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page

32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 17) Operations Manager screen to configure thresholds on operational metrics

For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation

Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data

The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations

bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge

bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification

bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor

bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification

Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state

bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached

bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification

bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality

33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

TRENDING

Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly

Figure 18) Trending of data growth and days-to-full prediction in Operations Manager

Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting

The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates

On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary

Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object

34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME

For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog

MONITORING STORAGE EFFICIENCY RETURNS

NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard

Figure 19) Storage efficiency dashboard in Operations Manager

35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

43 NOTIFICATION

Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low

After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process

Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure

Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP

NOTIFY BY E-MAIL

An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42

NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager

NOTIFY BY SNMP

Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested

36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 20) Configuring an alarm based on the threshold aggregate almost full

Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there

NOTIFY BY SCRIPT

Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line

dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full

37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

44 MITIGATE STORAGE USE

Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor

Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary

When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness

MITIGATION ACTIVITIES FOR AGGREGATES

Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow

1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives

2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero

3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space

4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage

controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability

6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes

7 If none of the listed activities can be used the application must be stopped to achieve a consistent state

The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online

38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Table 8) Mitigation alternatives to control use within aggregates

No Mitigation Activity Repeatability SLA Impact Preparation Time

Time to Show Effect

1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits

None HW procurement

Immediate (+rebalancing)

Data ONTAP 8 High limits

2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate

3 Shrink other volumes in the aggregate if they have enough free space

One time Low None Immediate

4 Run deduplication and shrink volumes Repeatable Low

Time to execute dedupe

Immediate

5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time

6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh

Next planned downtime window

Minutes Volume switch-over time

7 Prevent application data loss and stop the application then migrate (offline)

Repeatable Lowndashhigh Coordinate with app owner

Minutes Migration time

MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS

Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity

Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)

Table 9) Mitigation activities for resource tightness within volumes

No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect

1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate

2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate

3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate

4 Activate FAS deduplication for the volume (requires proper space guarantees) One time

Lowpossible performance impact

Wait for schedule Hours

5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate

Repeatable High Next planned downtime window

Minutes Volume migration time

6 Stop application and migrate data Repeatable High Coordinate wapp owner

Minutes Migration time

39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations

The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum

51 SAMPLE SETTING 1 REAL-LIFE SETTING

This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative

Figure 21) Storage to enable organic data growth between planned downtime windows

Data Data Growth

Planned Downtime Window

Planned Downtime Window

Months Time

Note Several months might fall between planned downtime windows to perform major mitigation alternatives

The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely

Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate

40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth

An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are

bull First metric Aggregate capacity used bull Second metric Aggregate space committed

Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space

Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space

Data Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used

Aggregate Space Committed

0ndash50 gt 65

0ndash110 gt 120

Provisioning New Storage Y

Capacity Assessment Adapt Thresholds

Mitigate

Y Y

Y

Provisioning New Storage Y

Assess Capacity Y

41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

52 SAMPLE SETTING 2 SETTLEDNOMAD

This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online

Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months

Settled

Detecting the Need to Act

Effect of Mitigation (eg migration)

Hours Time

N NN N N

In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on

the individual situation and is calculated against 100

The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases

Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative

Detection Threshold Notify Mitigation

gt 70 Storage operations Stop provisioning of storage

gt 85 Storage operations Stop extending provisioned storage

gt 90 Storage operations Relax resource situation and migrate nomad

42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used

Settled Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used 0ndash70 70ndash85 gt 90

Provisioning New Storage Y

Extending Already Provisioned Storage

Relax UtilizationmdashNetApp Data Motion a Nomad

Y Y

Y

N N N

You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors

43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe

Elapsed Time

Capacity

1 Month 3 Months

Committed Capacity

Overall Trend

Last 3-Month Trend

Capacity Used

1 2 3

As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows

1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window

2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment

3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)

44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not

exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect

b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives

c Determine the growth rate Operations Manager provides help in determining the trend of data growth

d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past

To provision storage following these steps

1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes

2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler

entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there

is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller

c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely

d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off

Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 6: Lun Provision

6 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

bull Commitment rate is an Operations Manager term that refers to the percentage of aggregate space committed to volumes

bull Deduplication rate is an Operations Manager term that measures the efficiency of the deduplication functionality This rate is measured for the volume and the aggregate level in percentage metric

For the aggregate we define different operational windows characterized by an interval of storage utilization We define an interval as the operational sweet spot corridor (green) where the aggregate should be operated for optimal utilization and service availability We define a tolerance interval (yellow) where actions are taken to get back into the operational sweet spot window We define an interval as a no-go area (red) where we do not intend to operate the aggregate This area might act as a last buffer of time or can be considered an area where operational staff has less experience

Figure 1 explains the terminology in the context of storage objects on a NetApp storage controller The aggregate is a physically limited storage object Aggregates are treated as fairly static containers and thus need proper size management

Figure 1) Terminology in context of the storage objects of volumes and aggregates

Committed Logical Storage

Usable Capacity of Aggregate

Data

Data Growth

Volumes with LUNSNAS

Operational Sweet Spot Corridor

Used Capacity

In practice commitment rates far above 100 are common in customer environments This document describes how to manage this

22 GOAL OF THIS DOCUMENT

The goal of this document is to achieve best-in-class storage efficiency and costs by consolidating the highest possible amount of data of applications while meeting the required service-level agreements The idea is to enable thin provisioning while controlling the use of physical resources in the NetApp shared storage infrastructure within a desired corridor NetApp storage efficiency technologies can save a significant amount of the IT budget On the other side running at an uncontrolled use level can reduce flexibility and cause headaches in managing data growth and service level fulfillment

The difference in managing thin-provisioned storage compared to traditional storage is that due to dense consolidation of application data accumulated application data growth rates might vary in a broader corridor than they would traditionally

7 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

To control the level of physical resources we outline methods for increasing storage utilization by provisioning storage with NetApp storage efficiency technologies We also outline how to react to organic data growth and how to level the storage use within a corridor of high efficiency by using NetApp technologies for data center flexibility

To summarize this document introduces three phases to manage the storage on NetApp storage controllers provisioning organic growth and mitigation of storage tightness These phases play a vital role for aggregates which are the coarsest storage abstraction of a NetApp storage controller

bull Provisioning phase In this phase storage is provisioned by the NetApp shared storage infrastructure which increases the utilization of aggregates The goal is to operate the aggregates with a high level of data consolidation in an efficient utilization corridor Figure 2 visualizes this corridor

bull Organic growth phase In this phase no further storage is provisioned to slow down growth of aggregate utilization The goal is to operate in a corridor of high utilization but safely reach the next planned downtime or administration window of the served applications Thus the organic growth phase is sized depending on the growth rate of capacity used and the length of the time frame to the planned downtime windows Figure 3 visualizes slowed growth

bull Mitigation of storage tightness phase This phase prevents an uncontrolled level of utilization and provides mitigation activities to lower this level Several mitigation alternatives are presented to mitigate storage tightness and to shift back the aggregate utilization in the desired operational corridor Figure 4 visualizes this mitigation

Figure 2) Storage consolidation and growing utilization using thin provisioning

Data Data Growth

Operational Sweet Spot Corridor

Aggregate Capacity

Figure 3) Controlled and slowed data growth within the operational sweet spot corridor in the aggregate

Data Growth

Aggregate Capacity

Data

8 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 4) Mitigate to prevent uncontrolled utilization

Aggregate Capacity

Data

Mitigate to prevent uncontrolled utilization

This document addresses best practice and tools to manage the NetApp storage infrastructure and to support the decision making during transition between phases

23 AUDIENCE

This document addresses two audiences

bull Decision makers It provides decision makers with an understanding of how to align storage efficiency best practices and processes within their existing operations organization

bull Operational teams It allows operational teams to understand monitoring and management of the storage infrastructure while mastering data growth It allows the operational teams to implement a basic setting and to position their usage goals We refer to existing NetApp best practices to increase the level of data consolidation and to achieve overcommitment for major applications

9 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

24 SCENARIO

As a scenario we consider a service provider who delivers a set of IT services This service provider might serve internal or external customers at different service levels The service levels provided are characterized by unplanned downtime as exemplified by Figure 5 This characterization is useful for aligning service data with physical resources In our example the highest level of service availability is delivered for Platinum services It is further assumed that provided services have different lifetimes and dates of creation

The service providerrsquos major goal is to deliver the services within the specified service level and to achieve maximum data consolidation through NetApp storage efficiency technologies This directly translates into cost savings related to capital investments floor space cooling maintenance and operational expenses However storing data in a consolidated way using storage efficiency technologies needs to take into account the aggregated data growth rates of the applications

Predicting the data growth rates depends on several parameters which are usually outside of control and knowledge of the service provider These parameters include usage characteristics number of users and functionality used To compensate for the deficiencies in precisely predicting the data growth over a specific time frame we propose a scheme that the service provider can use to achieve operational flexibility and adaptability to handle unpredictable growth rates

Figure 5) Sample service levels ordered by service disruption and recovery time

bull Best Effort Servicesbull DevTestbull ColdFillup databull Dynamicshort term

data

Bronzebull Production

Gold

bull Production

Gold

bull Productionbull Low budget

Silver

bull Productionbull Premium

customers

Platinum

Disruption

Recovery Time

Lowest Low Best Effort

Lowest

Low

Best Effort

In this document the focus is on operational aspects of storage efficiency technologies to achieve data center consolidation and agility Thus we take a seat next to the operational staff of our sample service provider to understand their questions regarding the technologies that make up NetApp storage efficiency

We address the questions posed by the operational staff such as How do we set it up How do we run this How do we integrate necessary procedures in our daily life

10 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

A set of questions pertaining to the lifetime of a service instance and its storage arises It starts with provisioning storage in a NetApp shared storage infrastructure detection and monitoring of situations endangering the level of a service necessary response procedures and promoting a continuous and smooth delivery of services

The questions are structured around a cycle that starts at provisioning storage and finishes at deprovisioning storage Figure 6 shows important questions regarding storage efficiency from an operational point of view

bull Provisioning deals with the provisioning of storage In this document provisioning models are shown to achieve a high level of storage consolidation while preserving operational flexibility For individual applications NetApp provides a rich library of technical reports on how to provision best

bull Monitoring deals with defining the goal of the monitoring process and which metrics to use to decide when to stop (for example the provisioning of storage) Relevant metrics provided by NetApp Operations Manager are described

bull Notifying deals with how to notify people in charge of when to perform certain actions The notification mechanisms within NetApp Operations Manager are described to deliver information in case of certain events

bull Mitigation deals with mechanisms to prevent uncontrolled utilization affecting operational flexibility and service fulfillment

Figure 6) Questions regarding storage efficiency from an operational point of view

bull How to provision best for storage efficiencysect Provisioning Modelssect NetApp Data Motion

awarenesssect From scratch or

templateclonebull Where to provision tobull Which SLAbull What are the defaults

Provision

Monitor

Notification

Mitigatebull What is criticalsect When to stop provisioningsect When to stop extendingsect When to relax tightnesssect How to detect

bull Monitoringsect Toolssect What to monitor

bull Who is in charge to reactbull How to notify

bull Available optionsbull Implications on SLAs bull When to act

Before discussing the details of this cycle it is important to understand the NetApp technologies that achieve storage efficiency and flexibility and to understand their relevance in the provisioning and operational phases

11 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY

The NetApp technologies provided in Table 1 are characterized by how they bring the most significant advantage and value For example FlexClonereg technology provides significant time and space advantages while provisioning but the space advantage might be reduced over time In contrast deduplication technology can achieve space savings over the entire storage lifetime

Table 1) NetApp technologies for storage efficiency and flexibility

NetApp Technology Benefit During Provisioning

During Operation

FlexClone Instantly creates thin provisioned and space-efficient writable clones X

FlexVolreg Implements thin provisioning and consumes only the needed space rather than the requested space X X

Deduplication Increases data consolidation by detecting and optimizing repeating patterns in primary and secondary storage X X

NetApp Data Motion Provides flexibility for management and optimal loadcapacity rebalancing in growing cloud environments without downtime

X

Aggregate Extensibility in Data ONTAPreg

Data ONTAP is the foundation for all features listed in this table and provides flexibility in handling physical resources It allows extending physical aggregates during operation

X X

Furthermore NetApp RAID-DPreg SATA and NetApp Flash Cache (formerly PAM II) are technologies that help to reduce the total cost of storage tremendously It is assumed that these technologies are deployed according to the requirements of the use case

NETAPP SHARED STORAGE INFRASTRUCTURE

To implement the practices outlined in this document some prerequisites must be met We assume a NetApp shared storage infrastructure implemented using large aggregates This acts as a utility for delivering storage in a flexible manner for applications with different needs It scales with the demands and serves a variety of different service levels at the same time NetApp Operations Manager monitors the NetApp shared storage infrastructure This software acts as a central management station and consolidates information about the current status of all NetApp storage controllers Based on this information Operations Manager indicates the necessity to change the phases and behavior in the data center

The NetApp shared storage infrastructure provides different ways for clients to consume its resources It can provide a traditional view where storage resources are located at a specific controller Using NetApp Provisioning Manager the infrastructure can also provide a service-oriented resource view that abstracts resource consumption and management from their physical controllers The abstractions of a storage service catalog resource pools and datasets provide easy manageability in the face of massive scale If multi-tenancy is not required then this is the abstraction of choice

Supported by the NetApp technologies MultiStorereg (vFilertrade) and NetApp Data Motion storage can be provided in a utilitylike fashion independent of physical hardware This makes possible high operational flexibility in the data center and allows building virtualized environments for multiple tenants with competing interests

12 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

3 PROVISIONING Some features such as data deduplication can be turned on or off at any time However to achieve the maximum consolidation and storage efficiency more strategies must be considered during the data-provisioning phase

In addition provisioning should take the flexibility of storage (for example the migrate ability) into account This allows administrators to easily move data off aggregates approaching capacity without downtime This is also an important aspect when planning to deliver services 24x7

Thus design goals for storage layouts are storage efficiency and operational flexibility In the following sections we discuss three orthogonal dimensions of provisioning storage Two of them focus on achieving data consolidation The third one focuses on achieving operational flexibility All dimensions can be combined independently Note that the achievable level of consolidation depends on the applications and its data

TR-3827 If You Are Doing This Then Your Storage Could Be Underutilized provides further understanding of storage efficiency and operational flexibility

31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING

This section deals with the way data is provisioned and the consequences for storage efficiency We recommend applying the so-called zero fat configurations

We consider the storage setup for a single application instance The presented configurations can be applied while provisioning storage from scratch and to already provisioned storage When the technical dimensions of storage provisioning are categorized in primary data and its Snapshottrade copies space there are four theoretical combinations for both network-attached storage (NAS) and storage area network (SAN) environments In practical applications only two variants are relevant to NAS and three variants are relevant to SAN storage

bull Full fat bull Low fat bull Zero fat

According to NetApp best practices we do not mix block and file data within a single volume which allows us to consider NAS and SAN environments separately

NAS

For NAS two options are recommended full fat and zero fat

bull Full fat The primary data and Snapshot copy space are preallocated bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves

the best ratio of storage efficiency when provisioning applications from scratch

13 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 7) Provisioning model for NAS storage from scratch Technically only two out of four combinations are possible

Primary Data (Files amp Directory) Space Allocation

Fat Thin

Full Fat Option No Option

No Option Zero Fat Option

Fat

Thin

Snapshot Copy Space

Allocation

Note Full fat is characterized slightly different in NAS and SAN due to their technical properties

FULL FAT PROVISIONING

Full fat provisioning NAS is the traditional (default) way to implement NFSCIFS shares Volumes in a full fat configuration are characterized as follows

bull Volumes are created with space guarantee bull The size of the volume follows the formula X + Δ

X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data

bull Because space used for Snapshot copies might grow unexpectedly the autosize function can be used to make space available when reaching a certain volume threshold This would also happen when the space reserved for user data gets low

bull Space reservation for Snapshot copies is used to hide the capacity used for Snapshot copies from the consumers (NAS clients)

bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Normally using autodelete is not recommended in NAS environments Keeping a certain number of

Snapshot copies for file versioningrestores might be part of the SLAs defined for file services

Note Deleting snapshots may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision

Table 2) Full fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default

autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit

14 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

Volume Snapshot Options

reserve yes Value depends on number of Snapshot copies and change rate within the volume

schedule switched on Automatic Snapshot technology schedules

autodelete off Deleting Snapshot copies is not recommended in most NAS environments

ZERO FAT PROVISIONING

The zero fat method is the most efficient way to provision NAS volumes

bull Volumes are created without space guarantee bull The size of the volume still follows the formula X + Δ

X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data Sizing the volume defines a container with a virtual size for the consumers NAS users are familiar with fixed-sized file shares

bull Space used for Snapshot copies can grow unexpectedly You can use the autosize function to make space available when reaching a certain volume threshold You can also use the autosize function when the space reserved for user data gets low

bull Space reserved for Snapshot copies is used to hide from the consumers (NAS clients) the capacity taken up by Snapshot copies

bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Using autodelete is normally not recommended in NAS environments Keeping a certain amount of

space for Snapshot copies for file versioningrestores is part of the SLAs defined for file services

Note Deleting Snapshot copies may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision

Table 3) Zero fat provisioning

Option Recommended Value Notes

Volume Options

guarantee none

fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default

autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit

15 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first - Autodelete is not recommended in most environments

Volume Snapshot Options

reserve yesno The value depends on the number of Snapshot copies and the change rate within the volume Displaying only the committed usable space using SLA is the preferred way to provision NAS storage However there might be situations in which the Snapshot reserve area is omitted (no)

schedule switched on Automatic Snapshot technology schedules

autodelete off Deleting Snapshot copies is not recommended in most NAS environments

SAN

For SAN we consider three options

bull Full fat Both primary data and its Snapshot copy space are preallocated bull Low fat The primary data is preallocated The Snapshot copy space is allocated on demand bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves

the best ratio of storage efficiency when provisioning applications from scratch

Figure 8) Provisioning model for SAN storage from scratch

Primary Data (LUN) Space Allocation

Fat Thin

Full Fat Option No Option

Low Fat Option Zero Fat Option

Fat

Thin

Snapshot Copy Space

Allocation

FULL FAT PROVISIONING

This method can be treated as the historical way of provisioning block storage with Data ONTAP

bull Volumes are created with space guarantee bull A fractional (overwrite) reserve is used to guarantee that the primary data can be overwritten

completely with Snapshot copies in place If this space is not available Snapshot copy creation will fail

bull The size of the volume follows the formula 2X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data

16 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The enhancements to the volume autosize capabilities (such as volume size-dependent thresholds) and the robustness of the Snapshot feature autodelete implementation have made full fat provisioning more or less obsolete As of today the default settings for creating volumesLUNs in Data ONTAP still apply to these settings See Provisioning from Scratch Full Fat to Zero Fat Provisioning for a discussion of using tools such as Provisioning Manager

Table 4) Full fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 100 Even technically possible a fractional reserve below 100 incorporates a potential risk to run out of Snapshot copy overwrite space This situation should be avoided

autosize off Autosize could be used as an option to create free space needed for Snapshot copy creation

Volume Snapshot Options

reserve 0

schedule switched off

autodelete off

LUN Options

reservation enable

LOW FAT PROVISIONING

With low fat provisioning we use a more space-efficient way to provision volumes

bull Volumes are created with space guarantee bull LUNs are created with space guarantee as well This setup does not benefit from unused blocks with

a LUN (During the lifetime of a LUN the amount of free unused blocks typically decreases Without space reclamation techniques allocated blocks on the storage system stay allocated)

bull The size of the volume follows the formula X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data

bull Because space used for Snapshot copies might grow unexpectedly the autosize and autodelete policies are used to make space available when reaching a preset volume threshold

Table 5) Low fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 0 Snapshot space is controlled by autodelete and autosize options

autosize on Turn autosize on

17 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first volume_grow Increasing the size of the volume does not destroy any data or information There is no reason not to increase the size of the volume It can be reverted afterward if the volume free space increases again There might be configurations where automatic volume growth is not desired

Volume Snapshot Options

reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration of Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)

schedule switched off

autodelete on There might be Snapshot copies that are needed to fulfill certain SLAs such as backup SLAs Setting this policy needs to be negotiated with the business requirements In the worst case scenario deleting Snapshot copies is not an option

autodelete options volume oldest_first There is a precedent for Snapshot copies being a candidate for deletion oldest_first is the current default

LUN Options

reservation enable Reserves space for the LUN during creation

ZERO FAT PROVISIONING

Full and low fat provisioning use fully allocated volumes and LUNs Per-default space allocation happens within the boundaries of the LUN and the volume Zero fat follows a 100 ldquoallocate on demandrdquo concept

bull Volumes are created without space guarantee bull LUNs are created without space guarantee bull The size of the volume follows the formula X ndash N + Δ

X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data N is the amount of unused blocks within a given LUN

Table 6) Zero fat provisioning

Option Recommended Value Notes

Volume Options

guarantee none No space reservation for volume at all

fractional_reserve 0 With Data ONTAP 733 fractional_reserve can be modified even for volumes without a space guarantee of type volume Prior to Data ONTAP 733 the value was fixed at 100

autosize on Turn autosize on

18 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first volume_grow

Volume Snapshot Options

reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)

schedule switched off

autodelete off Deleting Snapshot copies might be an option when the volume can no longer be resized because the maximum configured size has been reached or when the aggregatersquos free space becomes low

LUN Options

reservation disable No preallocation of blocks for LUN

SUMMARY OF PROVISIONING METHODS

There are good reasons for using any of the provisioning methods already described however full fat for SAN environments should be avoided wherever possible because of the storage efficiency ratio Even with a 100 block usage ratio on primary data zero fat provisioning has many advantages and is the preferred method

bull The aggregatersquos free space is a global pool that can serve space for volumes This gives more flexibility than volumes with their own dedicated free space

bull For SAN volumes the block consumption can be easily monitored bull Deduplication savings go directly into the global pool of free space which is the aggregate or the

resource pool in which it belongs bull Monitoring is needed only on the aggregate level Volumes will grow on demand

Table 7) Comparison of provisioning methods

Characteristics Full Fat Low Fat Zero Fat

Space consumption 2X + Δ X + Δ X ndash N + Δ2

Space efficient

No Partially for Snapshot copies

Yes

Monitoring Optional Required on volume and aggregate level

Required on aggregate level

Notificationmitigation process required

No Optional in most cases Yes

2 N is the traditional thin provisioning impact = amount of blocks logically allocated but not used

19 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Characteristics Full Fat Low Fat Zero Fat

Pool benefitting of dedupe savings

Volume fractional reserve area

Volume free space area Aggregate free space area

Risk of an out of space condition on primary data

No No as long as autodelete is able to delete any Snapshot copies

Yes when monitoring and notification processes are missing

Typical use cases Small installations None or few storage management skills (no monitoring infrastructure)

Large database environments

Shared storage infrastructure Testdev environments Storage pools for virtualized servers

FULLLOWZERO FAT PROVISIONING WITH PROVISIONING MANAGER

NetApp Provisioning Manager focuses on fast and efficient provisioning of storage resources in the NetApp storage infrastructure It speeds up provisioning workflows and boosts capacity usage by using policy-based automation for the entire NetApp NAS and SAN infrastructure These processes are

bull Faster than manually provisioning storage bull Easier to maintain than scripts bull Instrumental in minimizing the risk of data loss resulting from misconfigured storage

Provisioning Manager applies user-defined policies to consistently select the appropriate resources for each provisioning activity This frees administrators from the headache of searching for available space to provision and allows more time for strategic issues The use of a centralized management console allows administrators to monitor the status of their provisioned storage resources Provisioning Manager can help improve business agility and capacity use shrink provisioning time and improve administrator productivity Provisioning Managerrsquos thin provisioning and deduplication capabilities provide a high level of storage efficiency from your NetApp storage investment

A GUI allows you to implement the provisioning models fulllowzero fat within Provisioning Manager See Figure 9 and Figure 10 for configuring storage efficiency in a provisioning policy for NAS and SAN Whenever storage is provisioned using this provisioning policy the settings apply automatically For more information refer to TR-3710 Operations Manager Provisioning Manager and Protection Manager Best Practices Guide

Provisioning Manager encapsulates technical details when provisioning storage and supports an easy integration with existing management tools and orchestration frameworks Policies and their use in so-called datasets and storage services allow you to exploit NetApp storage efficiency technologies without exposing a high level of technical detail to a higher level of management software

Note Provisioning Manager up to version 40 does not allow you to specify autosize and autodelete individually the policy template determines if these features are used and which options are selected In order to implement the provisioning methods outlined a customized ldquoProvisioning Scriptrdquo needs to be provided to set autosize and autodelete parameters according to the recommendations for FullLowZero methods Post provisioning scripts are standard with Provision Manager Use caution when Provisioning Manager runs conformance checks this reverts individual settings

20 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete

21 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat

FULLLOWZERO FAT PROVISIONING FOR STORAGE SERVICES

Storage services are an easy abstraction to provision storage in a utilitylike fashion A storage service describes all characteristic attributes for storage needed in a certain scenario A storage service catalog lists the available templates and allows you to provision storage with these attributes on demand Technically storage services or datasets consist of one or more resource pools a protection policy and a provisioning policy Fulllowzero fat provisioning for storage services is configured in the configuration wizard of the provisioning policy Figure 11 shows the provisioning policies closest to fulllowzero fat configurations

Figure 11) Fulllowzero fat provisioning policies for datasets and storage services

Because this wizard is able to configure the deduplication feature two policies are configured for the zero fat configurations one with deduplication and one without deduplication

22 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

HOW SHOULD A VOLUME BE SIZED

Because physical allocation of data within a zero fat-provisioned volume is done on demand theoretically the volume size can be set to a very high value that can easily keep all application data and Snapshot copies As the unallocated space in the volume is not exclusively reserved for this volume itself all other applications can benefit from the shared pool of unallocated storage However NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand The important advantage is that the commitment rate acts as a metric for data consolidation

Note The commitment rate reflects the amount of logical data consolidation This metric is suitable for deciding when data should be left for organic growth

Additionally the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers

APPLICATION RECOMMENDATIONS

Thin provisioning is most effective when applications use data that is committed to them step by step When applications preformat data the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks Because thin provisioning has no performance penalty the general recommendation is to provision with the zero fat configuration

For SAN-attached storage NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed On Windowsreg platforms this can be configured in NetApp SnapDrivereg

For Oraclereg database best practices refer to WP-7084 Storage Efficiency in an Oracle Environment

32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS

This section deals with provisioning storage for similar applications from a golden template A valid use case is a hosting provider who offers and serves predefined application services in mass quantities Instead of provisioning each application from scratch the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure

When applications are provisioned this way NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space This achieves a high degree of data consolidation and cost savings

The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance CPU and memory

There are two ways to align application data to a NetApp shared storage infrastructure

bull Volume-centric storage layout bull Dedupe-centric storage layout

Depending on the data lifetime suitability for deduplication consistency and tool constraints one way of aligning application data is more appropriate than the other

In both variants the storage of the application template can be provisioned as either full low or zero fat The cloning procedure inherits the attributes of the parent volume To create space-efficient clones the space guarantee must be set to none

23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

CONSEQUENCES FOR MONITORING

When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well

VOLUME-CENTRIC STORAGE LAYOUT

In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies

In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions

bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time

bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data

A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN

Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate

Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases

bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts

24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance 1

Deduplication Block Sharing FlexVol

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance n

FlexClone Block Sharing

Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow

Best Practice

A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance

Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided

bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning

bull Preformatting data

25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

DEDUPE-CENTRIC STORAGE LAYOUT

In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication

This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity

This storage layout provides the following storage efficiency advantages in a short- and long-term perspective

bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns

bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data

In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation

TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment

Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates

26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically

Template

Instance 1

Instance 2

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled

Best Practice

This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication

Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume

We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties

27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION

Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach

The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left

It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to

bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a

limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the

migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval

Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate

Settled

Aggregate

Nomad Nomad

To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor

28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

SLA-BASED ASSESSMENT FOR SETTLEDNOMAD

The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances

We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment

Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances

Figure 15) Alignment by technical impact (sorted by negative impact in descending order)

Settled Nomad

Eg All FC

Instance Inst1 InstN

Medium LowInside SLANeg Impact HighOutside SLA

Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest

Figure 16) Alignment by business impact (sorted by negative impact in descending order)

Settled Nomad

Instance

Neg Impact $$ $

Semi-Settled Nomad

PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION

Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers

29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion

NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations

Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility

ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE

While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted

SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING

In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting

ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS

Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore

In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller

30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise

We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives

Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases

bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously

provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on

These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future

SITUATIONS PUTTING SLA FULFILLMENT AT RISK

Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy

The following list summarize situations that are critical for service delivery

bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time

bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use

bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react

bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this

looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a

ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk

Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are

contained

31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups

41 PHASES AND TRANSITIONS

This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness

bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase

bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase

bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase

42 MONITORING

NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors

Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set

THRESHOLDS

Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation

Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page

32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 17) Operations Manager screen to configure thresholds on operational metrics

For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation

Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data

The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations

bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge

bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification

bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor

bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification

Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state

bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached

bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification

bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality

33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

TRENDING

Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly

Figure 18) Trending of data growth and days-to-full prediction in Operations Manager

Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting

The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates

On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary

Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object

34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME

For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog

MONITORING STORAGE EFFICIENCY RETURNS

NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard

Figure 19) Storage efficiency dashboard in Operations Manager

35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

43 NOTIFICATION

Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low

After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process

Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure

Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP

NOTIFY BY E-MAIL

An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42

NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager

NOTIFY BY SNMP

Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested

36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 20) Configuring an alarm based on the threshold aggregate almost full

Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there

NOTIFY BY SCRIPT

Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line

dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full

37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

44 MITIGATE STORAGE USE

Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor

Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary

When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness

MITIGATION ACTIVITIES FOR AGGREGATES

Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow

1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives

2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero

3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space

4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage

controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability

6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes

7 If none of the listed activities can be used the application must be stopped to achieve a consistent state

The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online

38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Table 8) Mitigation alternatives to control use within aggregates

No Mitigation Activity Repeatability SLA Impact Preparation Time

Time to Show Effect

1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits

None HW procurement

Immediate (+rebalancing)

Data ONTAP 8 High limits

2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate

3 Shrink other volumes in the aggregate if they have enough free space

One time Low None Immediate

4 Run deduplication and shrink volumes Repeatable Low

Time to execute dedupe

Immediate

5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time

6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh

Next planned downtime window

Minutes Volume switch-over time

7 Prevent application data loss and stop the application then migrate (offline)

Repeatable Lowndashhigh Coordinate with app owner

Minutes Migration time

MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS

Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity

Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)

Table 9) Mitigation activities for resource tightness within volumes

No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect

1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate

2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate

3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate

4 Activate FAS deduplication for the volume (requires proper space guarantees) One time

Lowpossible performance impact

Wait for schedule Hours

5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate

Repeatable High Next planned downtime window

Minutes Volume migration time

6 Stop application and migrate data Repeatable High Coordinate wapp owner

Minutes Migration time

39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations

The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum

51 SAMPLE SETTING 1 REAL-LIFE SETTING

This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative

Figure 21) Storage to enable organic data growth between planned downtime windows

Data Data Growth

Planned Downtime Window

Planned Downtime Window

Months Time

Note Several months might fall between planned downtime windows to perform major mitigation alternatives

The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely

Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate

40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth

An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are

bull First metric Aggregate capacity used bull Second metric Aggregate space committed

Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space

Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space

Data Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used

Aggregate Space Committed

0ndash50 gt 65

0ndash110 gt 120

Provisioning New Storage Y

Capacity Assessment Adapt Thresholds

Mitigate

Y Y

Y

Provisioning New Storage Y

Assess Capacity Y

41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

52 SAMPLE SETTING 2 SETTLEDNOMAD

This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online

Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months

Settled

Detecting the Need to Act

Effect of Mitigation (eg migration)

Hours Time

N NN N N

In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on

the individual situation and is calculated against 100

The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases

Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative

Detection Threshold Notify Mitigation

gt 70 Storage operations Stop provisioning of storage

gt 85 Storage operations Stop extending provisioned storage

gt 90 Storage operations Relax resource situation and migrate nomad

42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used

Settled Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used 0ndash70 70ndash85 gt 90

Provisioning New Storage Y

Extending Already Provisioned Storage

Relax UtilizationmdashNetApp Data Motion a Nomad

Y Y

Y

N N N

You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors

43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe

Elapsed Time

Capacity

1 Month 3 Months

Committed Capacity

Overall Trend

Last 3-Month Trend

Capacity Used

1 2 3

As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows

1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window

2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment

3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)

44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not

exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect

b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives

c Determine the growth rate Operations Manager provides help in determining the trend of data growth

d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past

To provision storage following these steps

1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes

2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler

entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there

is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller

c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely

d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off

Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 7: Lun Provision

7 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

To control the level of physical resources we outline methods for increasing storage utilization by provisioning storage with NetApp storage efficiency technologies We also outline how to react to organic data growth and how to level the storage use within a corridor of high efficiency by using NetApp technologies for data center flexibility

To summarize this document introduces three phases to manage the storage on NetApp storage controllers provisioning organic growth and mitigation of storage tightness These phases play a vital role for aggregates which are the coarsest storage abstraction of a NetApp storage controller

bull Provisioning phase In this phase storage is provisioned by the NetApp shared storage infrastructure which increases the utilization of aggregates The goal is to operate the aggregates with a high level of data consolidation in an efficient utilization corridor Figure 2 visualizes this corridor

bull Organic growth phase In this phase no further storage is provisioned to slow down growth of aggregate utilization The goal is to operate in a corridor of high utilization but safely reach the next planned downtime or administration window of the served applications Thus the organic growth phase is sized depending on the growth rate of capacity used and the length of the time frame to the planned downtime windows Figure 3 visualizes slowed growth

bull Mitigation of storage tightness phase This phase prevents an uncontrolled level of utilization and provides mitigation activities to lower this level Several mitigation alternatives are presented to mitigate storage tightness and to shift back the aggregate utilization in the desired operational corridor Figure 4 visualizes this mitigation

Figure 2) Storage consolidation and growing utilization using thin provisioning

Data Data Growth

Operational Sweet Spot Corridor

Aggregate Capacity

Figure 3) Controlled and slowed data growth within the operational sweet spot corridor in the aggregate

Data Growth

Aggregate Capacity

Data

8 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 4) Mitigate to prevent uncontrolled utilization

Aggregate Capacity

Data

Mitigate to prevent uncontrolled utilization

This document addresses best practice and tools to manage the NetApp storage infrastructure and to support the decision making during transition between phases

23 AUDIENCE

This document addresses two audiences

bull Decision makers It provides decision makers with an understanding of how to align storage efficiency best practices and processes within their existing operations organization

bull Operational teams It allows operational teams to understand monitoring and management of the storage infrastructure while mastering data growth It allows the operational teams to implement a basic setting and to position their usage goals We refer to existing NetApp best practices to increase the level of data consolidation and to achieve overcommitment for major applications

9 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

24 SCENARIO

As a scenario we consider a service provider who delivers a set of IT services This service provider might serve internal or external customers at different service levels The service levels provided are characterized by unplanned downtime as exemplified by Figure 5 This characterization is useful for aligning service data with physical resources In our example the highest level of service availability is delivered for Platinum services It is further assumed that provided services have different lifetimes and dates of creation

The service providerrsquos major goal is to deliver the services within the specified service level and to achieve maximum data consolidation through NetApp storage efficiency technologies This directly translates into cost savings related to capital investments floor space cooling maintenance and operational expenses However storing data in a consolidated way using storage efficiency technologies needs to take into account the aggregated data growth rates of the applications

Predicting the data growth rates depends on several parameters which are usually outside of control and knowledge of the service provider These parameters include usage characteristics number of users and functionality used To compensate for the deficiencies in precisely predicting the data growth over a specific time frame we propose a scheme that the service provider can use to achieve operational flexibility and adaptability to handle unpredictable growth rates

Figure 5) Sample service levels ordered by service disruption and recovery time

bull Best Effort Servicesbull DevTestbull ColdFillup databull Dynamicshort term

data

Bronzebull Production

Gold

bull Production

Gold

bull Productionbull Low budget

Silver

bull Productionbull Premium

customers

Platinum

Disruption

Recovery Time

Lowest Low Best Effort

Lowest

Low

Best Effort

In this document the focus is on operational aspects of storage efficiency technologies to achieve data center consolidation and agility Thus we take a seat next to the operational staff of our sample service provider to understand their questions regarding the technologies that make up NetApp storage efficiency

We address the questions posed by the operational staff such as How do we set it up How do we run this How do we integrate necessary procedures in our daily life

10 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

A set of questions pertaining to the lifetime of a service instance and its storage arises It starts with provisioning storage in a NetApp shared storage infrastructure detection and monitoring of situations endangering the level of a service necessary response procedures and promoting a continuous and smooth delivery of services

The questions are structured around a cycle that starts at provisioning storage and finishes at deprovisioning storage Figure 6 shows important questions regarding storage efficiency from an operational point of view

bull Provisioning deals with the provisioning of storage In this document provisioning models are shown to achieve a high level of storage consolidation while preserving operational flexibility For individual applications NetApp provides a rich library of technical reports on how to provision best

bull Monitoring deals with defining the goal of the monitoring process and which metrics to use to decide when to stop (for example the provisioning of storage) Relevant metrics provided by NetApp Operations Manager are described

bull Notifying deals with how to notify people in charge of when to perform certain actions The notification mechanisms within NetApp Operations Manager are described to deliver information in case of certain events

bull Mitigation deals with mechanisms to prevent uncontrolled utilization affecting operational flexibility and service fulfillment

Figure 6) Questions regarding storage efficiency from an operational point of view

bull How to provision best for storage efficiencysect Provisioning Modelssect NetApp Data Motion

awarenesssect From scratch or

templateclonebull Where to provision tobull Which SLAbull What are the defaults

Provision

Monitor

Notification

Mitigatebull What is criticalsect When to stop provisioningsect When to stop extendingsect When to relax tightnesssect How to detect

bull Monitoringsect Toolssect What to monitor

bull Who is in charge to reactbull How to notify

bull Available optionsbull Implications on SLAs bull When to act

Before discussing the details of this cycle it is important to understand the NetApp technologies that achieve storage efficiency and flexibility and to understand their relevance in the provisioning and operational phases

11 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY

The NetApp technologies provided in Table 1 are characterized by how they bring the most significant advantage and value For example FlexClonereg technology provides significant time and space advantages while provisioning but the space advantage might be reduced over time In contrast deduplication technology can achieve space savings over the entire storage lifetime

Table 1) NetApp technologies for storage efficiency and flexibility

NetApp Technology Benefit During Provisioning

During Operation

FlexClone Instantly creates thin provisioned and space-efficient writable clones X

FlexVolreg Implements thin provisioning and consumes only the needed space rather than the requested space X X

Deduplication Increases data consolidation by detecting and optimizing repeating patterns in primary and secondary storage X X

NetApp Data Motion Provides flexibility for management and optimal loadcapacity rebalancing in growing cloud environments without downtime

X

Aggregate Extensibility in Data ONTAPreg

Data ONTAP is the foundation for all features listed in this table and provides flexibility in handling physical resources It allows extending physical aggregates during operation

X X

Furthermore NetApp RAID-DPreg SATA and NetApp Flash Cache (formerly PAM II) are technologies that help to reduce the total cost of storage tremendously It is assumed that these technologies are deployed according to the requirements of the use case

NETAPP SHARED STORAGE INFRASTRUCTURE

To implement the practices outlined in this document some prerequisites must be met We assume a NetApp shared storage infrastructure implemented using large aggregates This acts as a utility for delivering storage in a flexible manner for applications with different needs It scales with the demands and serves a variety of different service levels at the same time NetApp Operations Manager monitors the NetApp shared storage infrastructure This software acts as a central management station and consolidates information about the current status of all NetApp storage controllers Based on this information Operations Manager indicates the necessity to change the phases and behavior in the data center

The NetApp shared storage infrastructure provides different ways for clients to consume its resources It can provide a traditional view where storage resources are located at a specific controller Using NetApp Provisioning Manager the infrastructure can also provide a service-oriented resource view that abstracts resource consumption and management from their physical controllers The abstractions of a storage service catalog resource pools and datasets provide easy manageability in the face of massive scale If multi-tenancy is not required then this is the abstraction of choice

Supported by the NetApp technologies MultiStorereg (vFilertrade) and NetApp Data Motion storage can be provided in a utilitylike fashion independent of physical hardware This makes possible high operational flexibility in the data center and allows building virtualized environments for multiple tenants with competing interests

12 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

3 PROVISIONING Some features such as data deduplication can be turned on or off at any time However to achieve the maximum consolidation and storage efficiency more strategies must be considered during the data-provisioning phase

In addition provisioning should take the flexibility of storage (for example the migrate ability) into account This allows administrators to easily move data off aggregates approaching capacity without downtime This is also an important aspect when planning to deliver services 24x7

Thus design goals for storage layouts are storage efficiency and operational flexibility In the following sections we discuss three orthogonal dimensions of provisioning storage Two of them focus on achieving data consolidation The third one focuses on achieving operational flexibility All dimensions can be combined independently Note that the achievable level of consolidation depends on the applications and its data

TR-3827 If You Are Doing This Then Your Storage Could Be Underutilized provides further understanding of storage efficiency and operational flexibility

31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING

This section deals with the way data is provisioned and the consequences for storage efficiency We recommend applying the so-called zero fat configurations

We consider the storage setup for a single application instance The presented configurations can be applied while provisioning storage from scratch and to already provisioned storage When the technical dimensions of storage provisioning are categorized in primary data and its Snapshottrade copies space there are four theoretical combinations for both network-attached storage (NAS) and storage area network (SAN) environments In practical applications only two variants are relevant to NAS and three variants are relevant to SAN storage

bull Full fat bull Low fat bull Zero fat

According to NetApp best practices we do not mix block and file data within a single volume which allows us to consider NAS and SAN environments separately

NAS

For NAS two options are recommended full fat and zero fat

bull Full fat The primary data and Snapshot copy space are preallocated bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves

the best ratio of storage efficiency when provisioning applications from scratch

13 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 7) Provisioning model for NAS storage from scratch Technically only two out of four combinations are possible

Primary Data (Files amp Directory) Space Allocation

Fat Thin

Full Fat Option No Option

No Option Zero Fat Option

Fat

Thin

Snapshot Copy Space

Allocation

Note Full fat is characterized slightly different in NAS and SAN due to their technical properties

FULL FAT PROVISIONING

Full fat provisioning NAS is the traditional (default) way to implement NFSCIFS shares Volumes in a full fat configuration are characterized as follows

bull Volumes are created with space guarantee bull The size of the volume follows the formula X + Δ

X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data

bull Because space used for Snapshot copies might grow unexpectedly the autosize function can be used to make space available when reaching a certain volume threshold This would also happen when the space reserved for user data gets low

bull Space reservation for Snapshot copies is used to hide the capacity used for Snapshot copies from the consumers (NAS clients)

bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Normally using autodelete is not recommended in NAS environments Keeping a certain number of

Snapshot copies for file versioningrestores might be part of the SLAs defined for file services

Note Deleting snapshots may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision

Table 2) Full fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default

autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit

14 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

Volume Snapshot Options

reserve yes Value depends on number of Snapshot copies and change rate within the volume

schedule switched on Automatic Snapshot technology schedules

autodelete off Deleting Snapshot copies is not recommended in most NAS environments

ZERO FAT PROVISIONING

The zero fat method is the most efficient way to provision NAS volumes

bull Volumes are created without space guarantee bull The size of the volume still follows the formula X + Δ

X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data Sizing the volume defines a container with a virtual size for the consumers NAS users are familiar with fixed-sized file shares

bull Space used for Snapshot copies can grow unexpectedly You can use the autosize function to make space available when reaching a certain volume threshold You can also use the autosize function when the space reserved for user data gets low

bull Space reserved for Snapshot copies is used to hide from the consumers (NAS clients) the capacity taken up by Snapshot copies

bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Using autodelete is normally not recommended in NAS environments Keeping a certain amount of

space for Snapshot copies for file versioningrestores is part of the SLAs defined for file services

Note Deleting Snapshot copies may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision

Table 3) Zero fat provisioning

Option Recommended Value Notes

Volume Options

guarantee none

fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default

autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit

15 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first - Autodelete is not recommended in most environments

Volume Snapshot Options

reserve yesno The value depends on the number of Snapshot copies and the change rate within the volume Displaying only the committed usable space using SLA is the preferred way to provision NAS storage However there might be situations in which the Snapshot reserve area is omitted (no)

schedule switched on Automatic Snapshot technology schedules

autodelete off Deleting Snapshot copies is not recommended in most NAS environments

SAN

For SAN we consider three options

bull Full fat Both primary data and its Snapshot copy space are preallocated bull Low fat The primary data is preallocated The Snapshot copy space is allocated on demand bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves

the best ratio of storage efficiency when provisioning applications from scratch

Figure 8) Provisioning model for SAN storage from scratch

Primary Data (LUN) Space Allocation

Fat Thin

Full Fat Option No Option

Low Fat Option Zero Fat Option

Fat

Thin

Snapshot Copy Space

Allocation

FULL FAT PROVISIONING

This method can be treated as the historical way of provisioning block storage with Data ONTAP

bull Volumes are created with space guarantee bull A fractional (overwrite) reserve is used to guarantee that the primary data can be overwritten

completely with Snapshot copies in place If this space is not available Snapshot copy creation will fail

bull The size of the volume follows the formula 2X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data

16 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The enhancements to the volume autosize capabilities (such as volume size-dependent thresholds) and the robustness of the Snapshot feature autodelete implementation have made full fat provisioning more or less obsolete As of today the default settings for creating volumesLUNs in Data ONTAP still apply to these settings See Provisioning from Scratch Full Fat to Zero Fat Provisioning for a discussion of using tools such as Provisioning Manager

Table 4) Full fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 100 Even technically possible a fractional reserve below 100 incorporates a potential risk to run out of Snapshot copy overwrite space This situation should be avoided

autosize off Autosize could be used as an option to create free space needed for Snapshot copy creation

Volume Snapshot Options

reserve 0

schedule switched off

autodelete off

LUN Options

reservation enable

LOW FAT PROVISIONING

With low fat provisioning we use a more space-efficient way to provision volumes

bull Volumes are created with space guarantee bull LUNs are created with space guarantee as well This setup does not benefit from unused blocks with

a LUN (During the lifetime of a LUN the amount of free unused blocks typically decreases Without space reclamation techniques allocated blocks on the storage system stay allocated)

bull The size of the volume follows the formula X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data

bull Because space used for Snapshot copies might grow unexpectedly the autosize and autodelete policies are used to make space available when reaching a preset volume threshold

Table 5) Low fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 0 Snapshot space is controlled by autodelete and autosize options

autosize on Turn autosize on

17 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first volume_grow Increasing the size of the volume does not destroy any data or information There is no reason not to increase the size of the volume It can be reverted afterward if the volume free space increases again There might be configurations where automatic volume growth is not desired

Volume Snapshot Options

reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration of Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)

schedule switched off

autodelete on There might be Snapshot copies that are needed to fulfill certain SLAs such as backup SLAs Setting this policy needs to be negotiated with the business requirements In the worst case scenario deleting Snapshot copies is not an option

autodelete options volume oldest_first There is a precedent for Snapshot copies being a candidate for deletion oldest_first is the current default

LUN Options

reservation enable Reserves space for the LUN during creation

ZERO FAT PROVISIONING

Full and low fat provisioning use fully allocated volumes and LUNs Per-default space allocation happens within the boundaries of the LUN and the volume Zero fat follows a 100 ldquoallocate on demandrdquo concept

bull Volumes are created without space guarantee bull LUNs are created without space guarantee bull The size of the volume follows the formula X ndash N + Δ

X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data N is the amount of unused blocks within a given LUN

Table 6) Zero fat provisioning

Option Recommended Value Notes

Volume Options

guarantee none No space reservation for volume at all

fractional_reserve 0 With Data ONTAP 733 fractional_reserve can be modified even for volumes without a space guarantee of type volume Prior to Data ONTAP 733 the value was fixed at 100

autosize on Turn autosize on

18 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first volume_grow

Volume Snapshot Options

reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)

schedule switched off

autodelete off Deleting Snapshot copies might be an option when the volume can no longer be resized because the maximum configured size has been reached or when the aggregatersquos free space becomes low

LUN Options

reservation disable No preallocation of blocks for LUN

SUMMARY OF PROVISIONING METHODS

There are good reasons for using any of the provisioning methods already described however full fat for SAN environments should be avoided wherever possible because of the storage efficiency ratio Even with a 100 block usage ratio on primary data zero fat provisioning has many advantages and is the preferred method

bull The aggregatersquos free space is a global pool that can serve space for volumes This gives more flexibility than volumes with their own dedicated free space

bull For SAN volumes the block consumption can be easily monitored bull Deduplication savings go directly into the global pool of free space which is the aggregate or the

resource pool in which it belongs bull Monitoring is needed only on the aggregate level Volumes will grow on demand

Table 7) Comparison of provisioning methods

Characteristics Full Fat Low Fat Zero Fat

Space consumption 2X + Δ X + Δ X ndash N + Δ2

Space efficient

No Partially for Snapshot copies

Yes

Monitoring Optional Required on volume and aggregate level

Required on aggregate level

Notificationmitigation process required

No Optional in most cases Yes

2 N is the traditional thin provisioning impact = amount of blocks logically allocated but not used

19 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Characteristics Full Fat Low Fat Zero Fat

Pool benefitting of dedupe savings

Volume fractional reserve area

Volume free space area Aggregate free space area

Risk of an out of space condition on primary data

No No as long as autodelete is able to delete any Snapshot copies

Yes when monitoring and notification processes are missing

Typical use cases Small installations None or few storage management skills (no monitoring infrastructure)

Large database environments

Shared storage infrastructure Testdev environments Storage pools for virtualized servers

FULLLOWZERO FAT PROVISIONING WITH PROVISIONING MANAGER

NetApp Provisioning Manager focuses on fast and efficient provisioning of storage resources in the NetApp storage infrastructure It speeds up provisioning workflows and boosts capacity usage by using policy-based automation for the entire NetApp NAS and SAN infrastructure These processes are

bull Faster than manually provisioning storage bull Easier to maintain than scripts bull Instrumental in minimizing the risk of data loss resulting from misconfigured storage

Provisioning Manager applies user-defined policies to consistently select the appropriate resources for each provisioning activity This frees administrators from the headache of searching for available space to provision and allows more time for strategic issues The use of a centralized management console allows administrators to monitor the status of their provisioned storage resources Provisioning Manager can help improve business agility and capacity use shrink provisioning time and improve administrator productivity Provisioning Managerrsquos thin provisioning and deduplication capabilities provide a high level of storage efficiency from your NetApp storage investment

A GUI allows you to implement the provisioning models fulllowzero fat within Provisioning Manager See Figure 9 and Figure 10 for configuring storage efficiency in a provisioning policy for NAS and SAN Whenever storage is provisioned using this provisioning policy the settings apply automatically For more information refer to TR-3710 Operations Manager Provisioning Manager and Protection Manager Best Practices Guide

Provisioning Manager encapsulates technical details when provisioning storage and supports an easy integration with existing management tools and orchestration frameworks Policies and their use in so-called datasets and storage services allow you to exploit NetApp storage efficiency technologies without exposing a high level of technical detail to a higher level of management software

Note Provisioning Manager up to version 40 does not allow you to specify autosize and autodelete individually the policy template determines if these features are used and which options are selected In order to implement the provisioning methods outlined a customized ldquoProvisioning Scriptrdquo needs to be provided to set autosize and autodelete parameters according to the recommendations for FullLowZero methods Post provisioning scripts are standard with Provision Manager Use caution when Provisioning Manager runs conformance checks this reverts individual settings

20 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete

21 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat

FULLLOWZERO FAT PROVISIONING FOR STORAGE SERVICES

Storage services are an easy abstraction to provision storage in a utilitylike fashion A storage service describes all characteristic attributes for storage needed in a certain scenario A storage service catalog lists the available templates and allows you to provision storage with these attributes on demand Technically storage services or datasets consist of one or more resource pools a protection policy and a provisioning policy Fulllowzero fat provisioning for storage services is configured in the configuration wizard of the provisioning policy Figure 11 shows the provisioning policies closest to fulllowzero fat configurations

Figure 11) Fulllowzero fat provisioning policies for datasets and storage services

Because this wizard is able to configure the deduplication feature two policies are configured for the zero fat configurations one with deduplication and one without deduplication

22 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

HOW SHOULD A VOLUME BE SIZED

Because physical allocation of data within a zero fat-provisioned volume is done on demand theoretically the volume size can be set to a very high value that can easily keep all application data and Snapshot copies As the unallocated space in the volume is not exclusively reserved for this volume itself all other applications can benefit from the shared pool of unallocated storage However NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand The important advantage is that the commitment rate acts as a metric for data consolidation

Note The commitment rate reflects the amount of logical data consolidation This metric is suitable for deciding when data should be left for organic growth

Additionally the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers

APPLICATION RECOMMENDATIONS

Thin provisioning is most effective when applications use data that is committed to them step by step When applications preformat data the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks Because thin provisioning has no performance penalty the general recommendation is to provision with the zero fat configuration

For SAN-attached storage NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed On Windowsreg platforms this can be configured in NetApp SnapDrivereg

For Oraclereg database best practices refer to WP-7084 Storage Efficiency in an Oracle Environment

32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS

This section deals with provisioning storage for similar applications from a golden template A valid use case is a hosting provider who offers and serves predefined application services in mass quantities Instead of provisioning each application from scratch the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure

When applications are provisioned this way NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space This achieves a high degree of data consolidation and cost savings

The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance CPU and memory

There are two ways to align application data to a NetApp shared storage infrastructure

bull Volume-centric storage layout bull Dedupe-centric storage layout

Depending on the data lifetime suitability for deduplication consistency and tool constraints one way of aligning application data is more appropriate than the other

In both variants the storage of the application template can be provisioned as either full low or zero fat The cloning procedure inherits the attributes of the parent volume To create space-efficient clones the space guarantee must be set to none

23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

CONSEQUENCES FOR MONITORING

When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well

VOLUME-CENTRIC STORAGE LAYOUT

In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies

In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions

bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time

bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data

A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN

Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate

Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases

bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts

24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance 1

Deduplication Block Sharing FlexVol

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance n

FlexClone Block Sharing

Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow

Best Practice

A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance

Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided

bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning

bull Preformatting data

25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

DEDUPE-CENTRIC STORAGE LAYOUT

In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication

This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity

This storage layout provides the following storage efficiency advantages in a short- and long-term perspective

bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns

bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data

In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation

TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment

Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates

26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically

Template

Instance 1

Instance 2

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled

Best Practice

This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication

Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume

We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties

27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION

Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach

The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left

It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to

bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a

limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the

migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval

Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate

Settled

Aggregate

Nomad Nomad

To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor

28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

SLA-BASED ASSESSMENT FOR SETTLEDNOMAD

The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances

We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment

Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances

Figure 15) Alignment by technical impact (sorted by negative impact in descending order)

Settled Nomad

Eg All FC

Instance Inst1 InstN

Medium LowInside SLANeg Impact HighOutside SLA

Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest

Figure 16) Alignment by business impact (sorted by negative impact in descending order)

Settled Nomad

Instance

Neg Impact $$ $

Semi-Settled Nomad

PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION

Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers

29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion

NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations

Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility

ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE

While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted

SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING

In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting

ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS

Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore

In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller

30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise

We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives

Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases

bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously

provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on

These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future

SITUATIONS PUTTING SLA FULFILLMENT AT RISK

Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy

The following list summarize situations that are critical for service delivery

bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time

bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use

bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react

bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this

looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a

ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk

Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are

contained

31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups

41 PHASES AND TRANSITIONS

This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness

bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase

bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase

bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase

42 MONITORING

NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors

Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set

THRESHOLDS

Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation

Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page

32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 17) Operations Manager screen to configure thresholds on operational metrics

For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation

Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data

The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations

bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge

bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification

bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor

bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification

Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state

bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached

bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification

bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality

33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

TRENDING

Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly

Figure 18) Trending of data growth and days-to-full prediction in Operations Manager

Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting

The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates

On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary

Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object

34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME

For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog

MONITORING STORAGE EFFICIENCY RETURNS

NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard

Figure 19) Storage efficiency dashboard in Operations Manager

35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

43 NOTIFICATION

Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low

After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process

Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure

Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP

NOTIFY BY E-MAIL

An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42

NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager

NOTIFY BY SNMP

Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested

36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 20) Configuring an alarm based on the threshold aggregate almost full

Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there

NOTIFY BY SCRIPT

Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line

dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full

37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

44 MITIGATE STORAGE USE

Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor

Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary

When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness

MITIGATION ACTIVITIES FOR AGGREGATES

Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow

1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives

2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero

3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space

4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage

controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability

6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes

7 If none of the listed activities can be used the application must be stopped to achieve a consistent state

The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online

38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Table 8) Mitigation alternatives to control use within aggregates

No Mitigation Activity Repeatability SLA Impact Preparation Time

Time to Show Effect

1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits

None HW procurement

Immediate (+rebalancing)

Data ONTAP 8 High limits

2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate

3 Shrink other volumes in the aggregate if they have enough free space

One time Low None Immediate

4 Run deduplication and shrink volumes Repeatable Low

Time to execute dedupe

Immediate

5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time

6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh

Next planned downtime window

Minutes Volume switch-over time

7 Prevent application data loss and stop the application then migrate (offline)

Repeatable Lowndashhigh Coordinate with app owner

Minutes Migration time

MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS

Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity

Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)

Table 9) Mitigation activities for resource tightness within volumes

No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect

1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate

2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate

3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate

4 Activate FAS deduplication for the volume (requires proper space guarantees) One time

Lowpossible performance impact

Wait for schedule Hours

5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate

Repeatable High Next planned downtime window

Minutes Volume migration time

6 Stop application and migrate data Repeatable High Coordinate wapp owner

Minutes Migration time

39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations

The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum

51 SAMPLE SETTING 1 REAL-LIFE SETTING

This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative

Figure 21) Storage to enable organic data growth between planned downtime windows

Data Data Growth

Planned Downtime Window

Planned Downtime Window

Months Time

Note Several months might fall between planned downtime windows to perform major mitigation alternatives

The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely

Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate

40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth

An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are

bull First metric Aggregate capacity used bull Second metric Aggregate space committed

Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space

Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space

Data Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used

Aggregate Space Committed

0ndash50 gt 65

0ndash110 gt 120

Provisioning New Storage Y

Capacity Assessment Adapt Thresholds

Mitigate

Y Y

Y

Provisioning New Storage Y

Assess Capacity Y

41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

52 SAMPLE SETTING 2 SETTLEDNOMAD

This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online

Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months

Settled

Detecting the Need to Act

Effect of Mitigation (eg migration)

Hours Time

N NN N N

In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on

the individual situation and is calculated against 100

The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases

Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative

Detection Threshold Notify Mitigation

gt 70 Storage operations Stop provisioning of storage

gt 85 Storage operations Stop extending provisioned storage

gt 90 Storage operations Relax resource situation and migrate nomad

42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used

Settled Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used 0ndash70 70ndash85 gt 90

Provisioning New Storage Y

Extending Already Provisioned Storage

Relax UtilizationmdashNetApp Data Motion a Nomad

Y Y

Y

N N N

You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors

43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe

Elapsed Time

Capacity

1 Month 3 Months

Committed Capacity

Overall Trend

Last 3-Month Trend

Capacity Used

1 2 3

As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows

1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window

2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment

3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)

44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not

exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect

b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives

c Determine the growth rate Operations Manager provides help in determining the trend of data growth

d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past

To provision storage following these steps

1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes

2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler

entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there

is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller

c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely

d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off

Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 8: Lun Provision

8 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 4) Mitigate to prevent uncontrolled utilization

Aggregate Capacity

Data

Mitigate to prevent uncontrolled utilization

This document addresses best practice and tools to manage the NetApp storage infrastructure and to support the decision making during transition between phases

23 AUDIENCE

This document addresses two audiences

bull Decision makers It provides decision makers with an understanding of how to align storage efficiency best practices and processes within their existing operations organization

bull Operational teams It allows operational teams to understand monitoring and management of the storage infrastructure while mastering data growth It allows the operational teams to implement a basic setting and to position their usage goals We refer to existing NetApp best practices to increase the level of data consolidation and to achieve overcommitment for major applications

9 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

24 SCENARIO

As a scenario we consider a service provider who delivers a set of IT services This service provider might serve internal or external customers at different service levels The service levels provided are characterized by unplanned downtime as exemplified by Figure 5 This characterization is useful for aligning service data with physical resources In our example the highest level of service availability is delivered for Platinum services It is further assumed that provided services have different lifetimes and dates of creation

The service providerrsquos major goal is to deliver the services within the specified service level and to achieve maximum data consolidation through NetApp storage efficiency technologies This directly translates into cost savings related to capital investments floor space cooling maintenance and operational expenses However storing data in a consolidated way using storage efficiency technologies needs to take into account the aggregated data growth rates of the applications

Predicting the data growth rates depends on several parameters which are usually outside of control and knowledge of the service provider These parameters include usage characteristics number of users and functionality used To compensate for the deficiencies in precisely predicting the data growth over a specific time frame we propose a scheme that the service provider can use to achieve operational flexibility and adaptability to handle unpredictable growth rates

Figure 5) Sample service levels ordered by service disruption and recovery time

bull Best Effort Servicesbull DevTestbull ColdFillup databull Dynamicshort term

data

Bronzebull Production

Gold

bull Production

Gold

bull Productionbull Low budget

Silver

bull Productionbull Premium

customers

Platinum

Disruption

Recovery Time

Lowest Low Best Effort

Lowest

Low

Best Effort

In this document the focus is on operational aspects of storage efficiency technologies to achieve data center consolidation and agility Thus we take a seat next to the operational staff of our sample service provider to understand their questions regarding the technologies that make up NetApp storage efficiency

We address the questions posed by the operational staff such as How do we set it up How do we run this How do we integrate necessary procedures in our daily life

10 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

A set of questions pertaining to the lifetime of a service instance and its storage arises It starts with provisioning storage in a NetApp shared storage infrastructure detection and monitoring of situations endangering the level of a service necessary response procedures and promoting a continuous and smooth delivery of services

The questions are structured around a cycle that starts at provisioning storage and finishes at deprovisioning storage Figure 6 shows important questions regarding storage efficiency from an operational point of view

bull Provisioning deals with the provisioning of storage In this document provisioning models are shown to achieve a high level of storage consolidation while preserving operational flexibility For individual applications NetApp provides a rich library of technical reports on how to provision best

bull Monitoring deals with defining the goal of the monitoring process and which metrics to use to decide when to stop (for example the provisioning of storage) Relevant metrics provided by NetApp Operations Manager are described

bull Notifying deals with how to notify people in charge of when to perform certain actions The notification mechanisms within NetApp Operations Manager are described to deliver information in case of certain events

bull Mitigation deals with mechanisms to prevent uncontrolled utilization affecting operational flexibility and service fulfillment

Figure 6) Questions regarding storage efficiency from an operational point of view

bull How to provision best for storage efficiencysect Provisioning Modelssect NetApp Data Motion

awarenesssect From scratch or

templateclonebull Where to provision tobull Which SLAbull What are the defaults

Provision

Monitor

Notification

Mitigatebull What is criticalsect When to stop provisioningsect When to stop extendingsect When to relax tightnesssect How to detect

bull Monitoringsect Toolssect What to monitor

bull Who is in charge to reactbull How to notify

bull Available optionsbull Implications on SLAs bull When to act

Before discussing the details of this cycle it is important to understand the NetApp technologies that achieve storage efficiency and flexibility and to understand their relevance in the provisioning and operational phases

11 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY

The NetApp technologies provided in Table 1 are characterized by how they bring the most significant advantage and value For example FlexClonereg technology provides significant time and space advantages while provisioning but the space advantage might be reduced over time In contrast deduplication technology can achieve space savings over the entire storage lifetime

Table 1) NetApp technologies for storage efficiency and flexibility

NetApp Technology Benefit During Provisioning

During Operation

FlexClone Instantly creates thin provisioned and space-efficient writable clones X

FlexVolreg Implements thin provisioning and consumes only the needed space rather than the requested space X X

Deduplication Increases data consolidation by detecting and optimizing repeating patterns in primary and secondary storage X X

NetApp Data Motion Provides flexibility for management and optimal loadcapacity rebalancing in growing cloud environments without downtime

X

Aggregate Extensibility in Data ONTAPreg

Data ONTAP is the foundation for all features listed in this table and provides flexibility in handling physical resources It allows extending physical aggregates during operation

X X

Furthermore NetApp RAID-DPreg SATA and NetApp Flash Cache (formerly PAM II) are technologies that help to reduce the total cost of storage tremendously It is assumed that these technologies are deployed according to the requirements of the use case

NETAPP SHARED STORAGE INFRASTRUCTURE

To implement the practices outlined in this document some prerequisites must be met We assume a NetApp shared storage infrastructure implemented using large aggregates This acts as a utility for delivering storage in a flexible manner for applications with different needs It scales with the demands and serves a variety of different service levels at the same time NetApp Operations Manager monitors the NetApp shared storage infrastructure This software acts as a central management station and consolidates information about the current status of all NetApp storage controllers Based on this information Operations Manager indicates the necessity to change the phases and behavior in the data center

The NetApp shared storage infrastructure provides different ways for clients to consume its resources It can provide a traditional view where storage resources are located at a specific controller Using NetApp Provisioning Manager the infrastructure can also provide a service-oriented resource view that abstracts resource consumption and management from their physical controllers The abstractions of a storage service catalog resource pools and datasets provide easy manageability in the face of massive scale If multi-tenancy is not required then this is the abstraction of choice

Supported by the NetApp technologies MultiStorereg (vFilertrade) and NetApp Data Motion storage can be provided in a utilitylike fashion independent of physical hardware This makes possible high operational flexibility in the data center and allows building virtualized environments for multiple tenants with competing interests

12 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

3 PROVISIONING Some features such as data deduplication can be turned on or off at any time However to achieve the maximum consolidation and storage efficiency more strategies must be considered during the data-provisioning phase

In addition provisioning should take the flexibility of storage (for example the migrate ability) into account This allows administrators to easily move data off aggregates approaching capacity without downtime This is also an important aspect when planning to deliver services 24x7

Thus design goals for storage layouts are storage efficiency and operational flexibility In the following sections we discuss three orthogonal dimensions of provisioning storage Two of them focus on achieving data consolidation The third one focuses on achieving operational flexibility All dimensions can be combined independently Note that the achievable level of consolidation depends on the applications and its data

TR-3827 If You Are Doing This Then Your Storage Could Be Underutilized provides further understanding of storage efficiency and operational flexibility

31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING

This section deals with the way data is provisioned and the consequences for storage efficiency We recommend applying the so-called zero fat configurations

We consider the storage setup for a single application instance The presented configurations can be applied while provisioning storage from scratch and to already provisioned storage When the technical dimensions of storage provisioning are categorized in primary data and its Snapshottrade copies space there are four theoretical combinations for both network-attached storage (NAS) and storage area network (SAN) environments In practical applications only two variants are relevant to NAS and three variants are relevant to SAN storage

bull Full fat bull Low fat bull Zero fat

According to NetApp best practices we do not mix block and file data within a single volume which allows us to consider NAS and SAN environments separately

NAS

For NAS two options are recommended full fat and zero fat

bull Full fat The primary data and Snapshot copy space are preallocated bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves

the best ratio of storage efficiency when provisioning applications from scratch

13 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 7) Provisioning model for NAS storage from scratch Technically only two out of four combinations are possible

Primary Data (Files amp Directory) Space Allocation

Fat Thin

Full Fat Option No Option

No Option Zero Fat Option

Fat

Thin

Snapshot Copy Space

Allocation

Note Full fat is characterized slightly different in NAS and SAN due to their technical properties

FULL FAT PROVISIONING

Full fat provisioning NAS is the traditional (default) way to implement NFSCIFS shares Volumes in a full fat configuration are characterized as follows

bull Volumes are created with space guarantee bull The size of the volume follows the formula X + Δ

X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data

bull Because space used for Snapshot copies might grow unexpectedly the autosize function can be used to make space available when reaching a certain volume threshold This would also happen when the space reserved for user data gets low

bull Space reservation for Snapshot copies is used to hide the capacity used for Snapshot copies from the consumers (NAS clients)

bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Normally using autodelete is not recommended in NAS environments Keeping a certain number of

Snapshot copies for file versioningrestores might be part of the SLAs defined for file services

Note Deleting snapshots may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision

Table 2) Full fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default

autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit

14 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

Volume Snapshot Options

reserve yes Value depends on number of Snapshot copies and change rate within the volume

schedule switched on Automatic Snapshot technology schedules

autodelete off Deleting Snapshot copies is not recommended in most NAS environments

ZERO FAT PROVISIONING

The zero fat method is the most efficient way to provision NAS volumes

bull Volumes are created without space guarantee bull The size of the volume still follows the formula X + Δ

X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data Sizing the volume defines a container with a virtual size for the consumers NAS users are familiar with fixed-sized file shares

bull Space used for Snapshot copies can grow unexpectedly You can use the autosize function to make space available when reaching a certain volume threshold You can also use the autosize function when the space reserved for user data gets low

bull Space reserved for Snapshot copies is used to hide from the consumers (NAS clients) the capacity taken up by Snapshot copies

bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Using autodelete is normally not recommended in NAS environments Keeping a certain amount of

space for Snapshot copies for file versioningrestores is part of the SLAs defined for file services

Note Deleting Snapshot copies may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision

Table 3) Zero fat provisioning

Option Recommended Value Notes

Volume Options

guarantee none

fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default

autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit

15 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first - Autodelete is not recommended in most environments

Volume Snapshot Options

reserve yesno The value depends on the number of Snapshot copies and the change rate within the volume Displaying only the committed usable space using SLA is the preferred way to provision NAS storage However there might be situations in which the Snapshot reserve area is omitted (no)

schedule switched on Automatic Snapshot technology schedules

autodelete off Deleting Snapshot copies is not recommended in most NAS environments

SAN

For SAN we consider three options

bull Full fat Both primary data and its Snapshot copy space are preallocated bull Low fat The primary data is preallocated The Snapshot copy space is allocated on demand bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves

the best ratio of storage efficiency when provisioning applications from scratch

Figure 8) Provisioning model for SAN storage from scratch

Primary Data (LUN) Space Allocation

Fat Thin

Full Fat Option No Option

Low Fat Option Zero Fat Option

Fat

Thin

Snapshot Copy Space

Allocation

FULL FAT PROVISIONING

This method can be treated as the historical way of provisioning block storage with Data ONTAP

bull Volumes are created with space guarantee bull A fractional (overwrite) reserve is used to guarantee that the primary data can be overwritten

completely with Snapshot copies in place If this space is not available Snapshot copy creation will fail

bull The size of the volume follows the formula 2X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data

16 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The enhancements to the volume autosize capabilities (such as volume size-dependent thresholds) and the robustness of the Snapshot feature autodelete implementation have made full fat provisioning more or less obsolete As of today the default settings for creating volumesLUNs in Data ONTAP still apply to these settings See Provisioning from Scratch Full Fat to Zero Fat Provisioning for a discussion of using tools such as Provisioning Manager

Table 4) Full fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 100 Even technically possible a fractional reserve below 100 incorporates a potential risk to run out of Snapshot copy overwrite space This situation should be avoided

autosize off Autosize could be used as an option to create free space needed for Snapshot copy creation

Volume Snapshot Options

reserve 0

schedule switched off

autodelete off

LUN Options

reservation enable

LOW FAT PROVISIONING

With low fat provisioning we use a more space-efficient way to provision volumes

bull Volumes are created with space guarantee bull LUNs are created with space guarantee as well This setup does not benefit from unused blocks with

a LUN (During the lifetime of a LUN the amount of free unused blocks typically decreases Without space reclamation techniques allocated blocks on the storage system stay allocated)

bull The size of the volume follows the formula X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data

bull Because space used for Snapshot copies might grow unexpectedly the autosize and autodelete policies are used to make space available when reaching a preset volume threshold

Table 5) Low fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 0 Snapshot space is controlled by autodelete and autosize options

autosize on Turn autosize on

17 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first volume_grow Increasing the size of the volume does not destroy any data or information There is no reason not to increase the size of the volume It can be reverted afterward if the volume free space increases again There might be configurations where automatic volume growth is not desired

Volume Snapshot Options

reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration of Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)

schedule switched off

autodelete on There might be Snapshot copies that are needed to fulfill certain SLAs such as backup SLAs Setting this policy needs to be negotiated with the business requirements In the worst case scenario deleting Snapshot copies is not an option

autodelete options volume oldest_first There is a precedent for Snapshot copies being a candidate for deletion oldest_first is the current default

LUN Options

reservation enable Reserves space for the LUN during creation

ZERO FAT PROVISIONING

Full and low fat provisioning use fully allocated volumes and LUNs Per-default space allocation happens within the boundaries of the LUN and the volume Zero fat follows a 100 ldquoallocate on demandrdquo concept

bull Volumes are created without space guarantee bull LUNs are created without space guarantee bull The size of the volume follows the formula X ndash N + Δ

X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data N is the amount of unused blocks within a given LUN

Table 6) Zero fat provisioning

Option Recommended Value Notes

Volume Options

guarantee none No space reservation for volume at all

fractional_reserve 0 With Data ONTAP 733 fractional_reserve can be modified even for volumes without a space guarantee of type volume Prior to Data ONTAP 733 the value was fixed at 100

autosize on Turn autosize on

18 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first volume_grow

Volume Snapshot Options

reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)

schedule switched off

autodelete off Deleting Snapshot copies might be an option when the volume can no longer be resized because the maximum configured size has been reached or when the aggregatersquos free space becomes low

LUN Options

reservation disable No preallocation of blocks for LUN

SUMMARY OF PROVISIONING METHODS

There are good reasons for using any of the provisioning methods already described however full fat for SAN environments should be avoided wherever possible because of the storage efficiency ratio Even with a 100 block usage ratio on primary data zero fat provisioning has many advantages and is the preferred method

bull The aggregatersquos free space is a global pool that can serve space for volumes This gives more flexibility than volumes with their own dedicated free space

bull For SAN volumes the block consumption can be easily monitored bull Deduplication savings go directly into the global pool of free space which is the aggregate or the

resource pool in which it belongs bull Monitoring is needed only on the aggregate level Volumes will grow on demand

Table 7) Comparison of provisioning methods

Characteristics Full Fat Low Fat Zero Fat

Space consumption 2X + Δ X + Δ X ndash N + Δ2

Space efficient

No Partially for Snapshot copies

Yes

Monitoring Optional Required on volume and aggregate level

Required on aggregate level

Notificationmitigation process required

No Optional in most cases Yes

2 N is the traditional thin provisioning impact = amount of blocks logically allocated but not used

19 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Characteristics Full Fat Low Fat Zero Fat

Pool benefitting of dedupe savings

Volume fractional reserve area

Volume free space area Aggregate free space area

Risk of an out of space condition on primary data

No No as long as autodelete is able to delete any Snapshot copies

Yes when monitoring and notification processes are missing

Typical use cases Small installations None or few storage management skills (no monitoring infrastructure)

Large database environments

Shared storage infrastructure Testdev environments Storage pools for virtualized servers

FULLLOWZERO FAT PROVISIONING WITH PROVISIONING MANAGER

NetApp Provisioning Manager focuses on fast and efficient provisioning of storage resources in the NetApp storage infrastructure It speeds up provisioning workflows and boosts capacity usage by using policy-based automation for the entire NetApp NAS and SAN infrastructure These processes are

bull Faster than manually provisioning storage bull Easier to maintain than scripts bull Instrumental in minimizing the risk of data loss resulting from misconfigured storage

Provisioning Manager applies user-defined policies to consistently select the appropriate resources for each provisioning activity This frees administrators from the headache of searching for available space to provision and allows more time for strategic issues The use of a centralized management console allows administrators to monitor the status of their provisioned storage resources Provisioning Manager can help improve business agility and capacity use shrink provisioning time and improve administrator productivity Provisioning Managerrsquos thin provisioning and deduplication capabilities provide a high level of storage efficiency from your NetApp storage investment

A GUI allows you to implement the provisioning models fulllowzero fat within Provisioning Manager See Figure 9 and Figure 10 for configuring storage efficiency in a provisioning policy for NAS and SAN Whenever storage is provisioned using this provisioning policy the settings apply automatically For more information refer to TR-3710 Operations Manager Provisioning Manager and Protection Manager Best Practices Guide

Provisioning Manager encapsulates technical details when provisioning storage and supports an easy integration with existing management tools and orchestration frameworks Policies and their use in so-called datasets and storage services allow you to exploit NetApp storage efficiency technologies without exposing a high level of technical detail to a higher level of management software

Note Provisioning Manager up to version 40 does not allow you to specify autosize and autodelete individually the policy template determines if these features are used and which options are selected In order to implement the provisioning methods outlined a customized ldquoProvisioning Scriptrdquo needs to be provided to set autosize and autodelete parameters according to the recommendations for FullLowZero methods Post provisioning scripts are standard with Provision Manager Use caution when Provisioning Manager runs conformance checks this reverts individual settings

20 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete

21 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat

FULLLOWZERO FAT PROVISIONING FOR STORAGE SERVICES

Storage services are an easy abstraction to provision storage in a utilitylike fashion A storage service describes all characteristic attributes for storage needed in a certain scenario A storage service catalog lists the available templates and allows you to provision storage with these attributes on demand Technically storage services or datasets consist of one or more resource pools a protection policy and a provisioning policy Fulllowzero fat provisioning for storage services is configured in the configuration wizard of the provisioning policy Figure 11 shows the provisioning policies closest to fulllowzero fat configurations

Figure 11) Fulllowzero fat provisioning policies for datasets and storage services

Because this wizard is able to configure the deduplication feature two policies are configured for the zero fat configurations one with deduplication and one without deduplication

22 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

HOW SHOULD A VOLUME BE SIZED

Because physical allocation of data within a zero fat-provisioned volume is done on demand theoretically the volume size can be set to a very high value that can easily keep all application data and Snapshot copies As the unallocated space in the volume is not exclusively reserved for this volume itself all other applications can benefit from the shared pool of unallocated storage However NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand The important advantage is that the commitment rate acts as a metric for data consolidation

Note The commitment rate reflects the amount of logical data consolidation This metric is suitable for deciding when data should be left for organic growth

Additionally the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers

APPLICATION RECOMMENDATIONS

Thin provisioning is most effective when applications use data that is committed to them step by step When applications preformat data the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks Because thin provisioning has no performance penalty the general recommendation is to provision with the zero fat configuration

For SAN-attached storage NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed On Windowsreg platforms this can be configured in NetApp SnapDrivereg

For Oraclereg database best practices refer to WP-7084 Storage Efficiency in an Oracle Environment

32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS

This section deals with provisioning storage for similar applications from a golden template A valid use case is a hosting provider who offers and serves predefined application services in mass quantities Instead of provisioning each application from scratch the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure

When applications are provisioned this way NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space This achieves a high degree of data consolidation and cost savings

The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance CPU and memory

There are two ways to align application data to a NetApp shared storage infrastructure

bull Volume-centric storage layout bull Dedupe-centric storage layout

Depending on the data lifetime suitability for deduplication consistency and tool constraints one way of aligning application data is more appropriate than the other

In both variants the storage of the application template can be provisioned as either full low or zero fat The cloning procedure inherits the attributes of the parent volume To create space-efficient clones the space guarantee must be set to none

23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

CONSEQUENCES FOR MONITORING

When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well

VOLUME-CENTRIC STORAGE LAYOUT

In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies

In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions

bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time

bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data

A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN

Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate

Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases

bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts

24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance 1

Deduplication Block Sharing FlexVol

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance n

FlexClone Block Sharing

Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow

Best Practice

A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance

Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided

bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning

bull Preformatting data

25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

DEDUPE-CENTRIC STORAGE LAYOUT

In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication

This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity

This storage layout provides the following storage efficiency advantages in a short- and long-term perspective

bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns

bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data

In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation

TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment

Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates

26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically

Template

Instance 1

Instance 2

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled

Best Practice

This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication

Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume

We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties

27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION

Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach

The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left

It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to

bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a

limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the

migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval

Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate

Settled

Aggregate

Nomad Nomad

To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor

28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

SLA-BASED ASSESSMENT FOR SETTLEDNOMAD

The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances

We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment

Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances

Figure 15) Alignment by technical impact (sorted by negative impact in descending order)

Settled Nomad

Eg All FC

Instance Inst1 InstN

Medium LowInside SLANeg Impact HighOutside SLA

Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest

Figure 16) Alignment by business impact (sorted by negative impact in descending order)

Settled Nomad

Instance

Neg Impact $$ $

Semi-Settled Nomad

PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION

Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers

29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion

NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations

Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility

ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE

While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted

SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING

In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting

ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS

Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore

In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller

30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise

We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives

Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases

bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously

provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on

These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future

SITUATIONS PUTTING SLA FULFILLMENT AT RISK

Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy

The following list summarize situations that are critical for service delivery

bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time

bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use

bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react

bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this

looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a

ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk

Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are

contained

31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups

41 PHASES AND TRANSITIONS

This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness

bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase

bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase

bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase

42 MONITORING

NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors

Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set

THRESHOLDS

Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation

Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page

32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 17) Operations Manager screen to configure thresholds on operational metrics

For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation

Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data

The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations

bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge

bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification

bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor

bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification

Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state

bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached

bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification

bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality

33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

TRENDING

Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly

Figure 18) Trending of data growth and days-to-full prediction in Operations Manager

Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting

The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates

On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary

Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object

34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME

For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog

MONITORING STORAGE EFFICIENCY RETURNS

NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard

Figure 19) Storage efficiency dashboard in Operations Manager

35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

43 NOTIFICATION

Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low

After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process

Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure

Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP

NOTIFY BY E-MAIL

An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42

NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager

NOTIFY BY SNMP

Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested

36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 20) Configuring an alarm based on the threshold aggregate almost full

Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there

NOTIFY BY SCRIPT

Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line

dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full

37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

44 MITIGATE STORAGE USE

Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor

Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary

When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness

MITIGATION ACTIVITIES FOR AGGREGATES

Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow

1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives

2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero

3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space

4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage

controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability

6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes

7 If none of the listed activities can be used the application must be stopped to achieve a consistent state

The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online

38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Table 8) Mitigation alternatives to control use within aggregates

No Mitigation Activity Repeatability SLA Impact Preparation Time

Time to Show Effect

1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits

None HW procurement

Immediate (+rebalancing)

Data ONTAP 8 High limits

2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate

3 Shrink other volumes in the aggregate if they have enough free space

One time Low None Immediate

4 Run deduplication and shrink volumes Repeatable Low

Time to execute dedupe

Immediate

5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time

6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh

Next planned downtime window

Minutes Volume switch-over time

7 Prevent application data loss and stop the application then migrate (offline)

Repeatable Lowndashhigh Coordinate with app owner

Minutes Migration time

MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS

Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity

Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)

Table 9) Mitigation activities for resource tightness within volumes

No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect

1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate

2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate

3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate

4 Activate FAS deduplication for the volume (requires proper space guarantees) One time

Lowpossible performance impact

Wait for schedule Hours

5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate

Repeatable High Next planned downtime window

Minutes Volume migration time

6 Stop application and migrate data Repeatable High Coordinate wapp owner

Minutes Migration time

39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations

The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum

51 SAMPLE SETTING 1 REAL-LIFE SETTING

This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative

Figure 21) Storage to enable organic data growth between planned downtime windows

Data Data Growth

Planned Downtime Window

Planned Downtime Window

Months Time

Note Several months might fall between planned downtime windows to perform major mitigation alternatives

The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely

Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate

40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth

An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are

bull First metric Aggregate capacity used bull Second metric Aggregate space committed

Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space

Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space

Data Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used

Aggregate Space Committed

0ndash50 gt 65

0ndash110 gt 120

Provisioning New Storage Y

Capacity Assessment Adapt Thresholds

Mitigate

Y Y

Y

Provisioning New Storage Y

Assess Capacity Y

41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

52 SAMPLE SETTING 2 SETTLEDNOMAD

This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online

Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months

Settled

Detecting the Need to Act

Effect of Mitigation (eg migration)

Hours Time

N NN N N

In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on

the individual situation and is calculated against 100

The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases

Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative

Detection Threshold Notify Mitigation

gt 70 Storage operations Stop provisioning of storage

gt 85 Storage operations Stop extending provisioned storage

gt 90 Storage operations Relax resource situation and migrate nomad

42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used

Settled Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used 0ndash70 70ndash85 gt 90

Provisioning New Storage Y

Extending Already Provisioned Storage

Relax UtilizationmdashNetApp Data Motion a Nomad

Y Y

Y

N N N

You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors

43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe

Elapsed Time

Capacity

1 Month 3 Months

Committed Capacity

Overall Trend

Last 3-Month Trend

Capacity Used

1 2 3

As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows

1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window

2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment

3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)

44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not

exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect

b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives

c Determine the growth rate Operations Manager provides help in determining the trend of data growth

d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past

To provision storage following these steps

1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes

2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler

entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there

is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller

c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely

d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off

Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 9: Lun Provision

9 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

24 SCENARIO

As a scenario we consider a service provider who delivers a set of IT services This service provider might serve internal or external customers at different service levels The service levels provided are characterized by unplanned downtime as exemplified by Figure 5 This characterization is useful for aligning service data with physical resources In our example the highest level of service availability is delivered for Platinum services It is further assumed that provided services have different lifetimes and dates of creation

The service providerrsquos major goal is to deliver the services within the specified service level and to achieve maximum data consolidation through NetApp storage efficiency technologies This directly translates into cost savings related to capital investments floor space cooling maintenance and operational expenses However storing data in a consolidated way using storage efficiency technologies needs to take into account the aggregated data growth rates of the applications

Predicting the data growth rates depends on several parameters which are usually outside of control and knowledge of the service provider These parameters include usage characteristics number of users and functionality used To compensate for the deficiencies in precisely predicting the data growth over a specific time frame we propose a scheme that the service provider can use to achieve operational flexibility and adaptability to handle unpredictable growth rates

Figure 5) Sample service levels ordered by service disruption and recovery time

bull Best Effort Servicesbull DevTestbull ColdFillup databull Dynamicshort term

data

Bronzebull Production

Gold

bull Production

Gold

bull Productionbull Low budget

Silver

bull Productionbull Premium

customers

Platinum

Disruption

Recovery Time

Lowest Low Best Effort

Lowest

Low

Best Effort

In this document the focus is on operational aspects of storage efficiency technologies to achieve data center consolidation and agility Thus we take a seat next to the operational staff of our sample service provider to understand their questions regarding the technologies that make up NetApp storage efficiency

We address the questions posed by the operational staff such as How do we set it up How do we run this How do we integrate necessary procedures in our daily life

10 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

A set of questions pertaining to the lifetime of a service instance and its storage arises It starts with provisioning storage in a NetApp shared storage infrastructure detection and monitoring of situations endangering the level of a service necessary response procedures and promoting a continuous and smooth delivery of services

The questions are structured around a cycle that starts at provisioning storage and finishes at deprovisioning storage Figure 6 shows important questions regarding storage efficiency from an operational point of view

bull Provisioning deals with the provisioning of storage In this document provisioning models are shown to achieve a high level of storage consolidation while preserving operational flexibility For individual applications NetApp provides a rich library of technical reports on how to provision best

bull Monitoring deals with defining the goal of the monitoring process and which metrics to use to decide when to stop (for example the provisioning of storage) Relevant metrics provided by NetApp Operations Manager are described

bull Notifying deals with how to notify people in charge of when to perform certain actions The notification mechanisms within NetApp Operations Manager are described to deliver information in case of certain events

bull Mitigation deals with mechanisms to prevent uncontrolled utilization affecting operational flexibility and service fulfillment

Figure 6) Questions regarding storage efficiency from an operational point of view

bull How to provision best for storage efficiencysect Provisioning Modelssect NetApp Data Motion

awarenesssect From scratch or

templateclonebull Where to provision tobull Which SLAbull What are the defaults

Provision

Monitor

Notification

Mitigatebull What is criticalsect When to stop provisioningsect When to stop extendingsect When to relax tightnesssect How to detect

bull Monitoringsect Toolssect What to monitor

bull Who is in charge to reactbull How to notify

bull Available optionsbull Implications on SLAs bull When to act

Before discussing the details of this cycle it is important to understand the NetApp technologies that achieve storage efficiency and flexibility and to understand their relevance in the provisioning and operational phases

11 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY

The NetApp technologies provided in Table 1 are characterized by how they bring the most significant advantage and value For example FlexClonereg technology provides significant time and space advantages while provisioning but the space advantage might be reduced over time In contrast deduplication technology can achieve space savings over the entire storage lifetime

Table 1) NetApp technologies for storage efficiency and flexibility

NetApp Technology Benefit During Provisioning

During Operation

FlexClone Instantly creates thin provisioned and space-efficient writable clones X

FlexVolreg Implements thin provisioning and consumes only the needed space rather than the requested space X X

Deduplication Increases data consolidation by detecting and optimizing repeating patterns in primary and secondary storage X X

NetApp Data Motion Provides flexibility for management and optimal loadcapacity rebalancing in growing cloud environments without downtime

X

Aggregate Extensibility in Data ONTAPreg

Data ONTAP is the foundation for all features listed in this table and provides flexibility in handling physical resources It allows extending physical aggregates during operation

X X

Furthermore NetApp RAID-DPreg SATA and NetApp Flash Cache (formerly PAM II) are technologies that help to reduce the total cost of storage tremendously It is assumed that these technologies are deployed according to the requirements of the use case

NETAPP SHARED STORAGE INFRASTRUCTURE

To implement the practices outlined in this document some prerequisites must be met We assume a NetApp shared storage infrastructure implemented using large aggregates This acts as a utility for delivering storage in a flexible manner for applications with different needs It scales with the demands and serves a variety of different service levels at the same time NetApp Operations Manager monitors the NetApp shared storage infrastructure This software acts as a central management station and consolidates information about the current status of all NetApp storage controllers Based on this information Operations Manager indicates the necessity to change the phases and behavior in the data center

The NetApp shared storage infrastructure provides different ways for clients to consume its resources It can provide a traditional view where storage resources are located at a specific controller Using NetApp Provisioning Manager the infrastructure can also provide a service-oriented resource view that abstracts resource consumption and management from their physical controllers The abstractions of a storage service catalog resource pools and datasets provide easy manageability in the face of massive scale If multi-tenancy is not required then this is the abstraction of choice

Supported by the NetApp technologies MultiStorereg (vFilertrade) and NetApp Data Motion storage can be provided in a utilitylike fashion independent of physical hardware This makes possible high operational flexibility in the data center and allows building virtualized environments for multiple tenants with competing interests

12 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

3 PROVISIONING Some features such as data deduplication can be turned on or off at any time However to achieve the maximum consolidation and storage efficiency more strategies must be considered during the data-provisioning phase

In addition provisioning should take the flexibility of storage (for example the migrate ability) into account This allows administrators to easily move data off aggregates approaching capacity without downtime This is also an important aspect when planning to deliver services 24x7

Thus design goals for storage layouts are storage efficiency and operational flexibility In the following sections we discuss three orthogonal dimensions of provisioning storage Two of them focus on achieving data consolidation The third one focuses on achieving operational flexibility All dimensions can be combined independently Note that the achievable level of consolidation depends on the applications and its data

TR-3827 If You Are Doing This Then Your Storage Could Be Underutilized provides further understanding of storage efficiency and operational flexibility

31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING

This section deals with the way data is provisioned and the consequences for storage efficiency We recommend applying the so-called zero fat configurations

We consider the storage setup for a single application instance The presented configurations can be applied while provisioning storage from scratch and to already provisioned storage When the technical dimensions of storage provisioning are categorized in primary data and its Snapshottrade copies space there are four theoretical combinations for both network-attached storage (NAS) and storage area network (SAN) environments In practical applications only two variants are relevant to NAS and three variants are relevant to SAN storage

bull Full fat bull Low fat bull Zero fat

According to NetApp best practices we do not mix block and file data within a single volume which allows us to consider NAS and SAN environments separately

NAS

For NAS two options are recommended full fat and zero fat

bull Full fat The primary data and Snapshot copy space are preallocated bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves

the best ratio of storage efficiency when provisioning applications from scratch

13 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 7) Provisioning model for NAS storage from scratch Technically only two out of four combinations are possible

Primary Data (Files amp Directory) Space Allocation

Fat Thin

Full Fat Option No Option

No Option Zero Fat Option

Fat

Thin

Snapshot Copy Space

Allocation

Note Full fat is characterized slightly different in NAS and SAN due to their technical properties

FULL FAT PROVISIONING

Full fat provisioning NAS is the traditional (default) way to implement NFSCIFS shares Volumes in a full fat configuration are characterized as follows

bull Volumes are created with space guarantee bull The size of the volume follows the formula X + Δ

X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data

bull Because space used for Snapshot copies might grow unexpectedly the autosize function can be used to make space available when reaching a certain volume threshold This would also happen when the space reserved for user data gets low

bull Space reservation for Snapshot copies is used to hide the capacity used for Snapshot copies from the consumers (NAS clients)

bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Normally using autodelete is not recommended in NAS environments Keeping a certain number of

Snapshot copies for file versioningrestores might be part of the SLAs defined for file services

Note Deleting snapshots may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision

Table 2) Full fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default

autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit

14 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

Volume Snapshot Options

reserve yes Value depends on number of Snapshot copies and change rate within the volume

schedule switched on Automatic Snapshot technology schedules

autodelete off Deleting Snapshot copies is not recommended in most NAS environments

ZERO FAT PROVISIONING

The zero fat method is the most efficient way to provision NAS volumes

bull Volumes are created without space guarantee bull The size of the volume still follows the formula X + Δ

X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data Sizing the volume defines a container with a virtual size for the consumers NAS users are familiar with fixed-sized file shares

bull Space used for Snapshot copies can grow unexpectedly You can use the autosize function to make space available when reaching a certain volume threshold You can also use the autosize function when the space reserved for user data gets low

bull Space reserved for Snapshot copies is used to hide from the consumers (NAS clients) the capacity taken up by Snapshot copies

bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Using autodelete is normally not recommended in NAS environments Keeping a certain amount of

space for Snapshot copies for file versioningrestores is part of the SLAs defined for file services

Note Deleting Snapshot copies may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision

Table 3) Zero fat provisioning

Option Recommended Value Notes

Volume Options

guarantee none

fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default

autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit

15 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first - Autodelete is not recommended in most environments

Volume Snapshot Options

reserve yesno The value depends on the number of Snapshot copies and the change rate within the volume Displaying only the committed usable space using SLA is the preferred way to provision NAS storage However there might be situations in which the Snapshot reserve area is omitted (no)

schedule switched on Automatic Snapshot technology schedules

autodelete off Deleting Snapshot copies is not recommended in most NAS environments

SAN

For SAN we consider three options

bull Full fat Both primary data and its Snapshot copy space are preallocated bull Low fat The primary data is preallocated The Snapshot copy space is allocated on demand bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves

the best ratio of storage efficiency when provisioning applications from scratch

Figure 8) Provisioning model for SAN storage from scratch

Primary Data (LUN) Space Allocation

Fat Thin

Full Fat Option No Option

Low Fat Option Zero Fat Option

Fat

Thin

Snapshot Copy Space

Allocation

FULL FAT PROVISIONING

This method can be treated as the historical way of provisioning block storage with Data ONTAP

bull Volumes are created with space guarantee bull A fractional (overwrite) reserve is used to guarantee that the primary data can be overwritten

completely with Snapshot copies in place If this space is not available Snapshot copy creation will fail

bull The size of the volume follows the formula 2X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data

16 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The enhancements to the volume autosize capabilities (such as volume size-dependent thresholds) and the robustness of the Snapshot feature autodelete implementation have made full fat provisioning more or less obsolete As of today the default settings for creating volumesLUNs in Data ONTAP still apply to these settings See Provisioning from Scratch Full Fat to Zero Fat Provisioning for a discussion of using tools such as Provisioning Manager

Table 4) Full fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 100 Even technically possible a fractional reserve below 100 incorporates a potential risk to run out of Snapshot copy overwrite space This situation should be avoided

autosize off Autosize could be used as an option to create free space needed for Snapshot copy creation

Volume Snapshot Options

reserve 0

schedule switched off

autodelete off

LUN Options

reservation enable

LOW FAT PROVISIONING

With low fat provisioning we use a more space-efficient way to provision volumes

bull Volumes are created with space guarantee bull LUNs are created with space guarantee as well This setup does not benefit from unused blocks with

a LUN (During the lifetime of a LUN the amount of free unused blocks typically decreases Without space reclamation techniques allocated blocks on the storage system stay allocated)

bull The size of the volume follows the formula X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data

bull Because space used for Snapshot copies might grow unexpectedly the autosize and autodelete policies are used to make space available when reaching a preset volume threshold

Table 5) Low fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 0 Snapshot space is controlled by autodelete and autosize options

autosize on Turn autosize on

17 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first volume_grow Increasing the size of the volume does not destroy any data or information There is no reason not to increase the size of the volume It can be reverted afterward if the volume free space increases again There might be configurations where automatic volume growth is not desired

Volume Snapshot Options

reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration of Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)

schedule switched off

autodelete on There might be Snapshot copies that are needed to fulfill certain SLAs such as backup SLAs Setting this policy needs to be negotiated with the business requirements In the worst case scenario deleting Snapshot copies is not an option

autodelete options volume oldest_first There is a precedent for Snapshot copies being a candidate for deletion oldest_first is the current default

LUN Options

reservation enable Reserves space for the LUN during creation

ZERO FAT PROVISIONING

Full and low fat provisioning use fully allocated volumes and LUNs Per-default space allocation happens within the boundaries of the LUN and the volume Zero fat follows a 100 ldquoallocate on demandrdquo concept

bull Volumes are created without space guarantee bull LUNs are created without space guarantee bull The size of the volume follows the formula X ndash N + Δ

X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data N is the amount of unused blocks within a given LUN

Table 6) Zero fat provisioning

Option Recommended Value Notes

Volume Options

guarantee none No space reservation for volume at all

fractional_reserve 0 With Data ONTAP 733 fractional_reserve can be modified even for volumes without a space guarantee of type volume Prior to Data ONTAP 733 the value was fixed at 100

autosize on Turn autosize on

18 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first volume_grow

Volume Snapshot Options

reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)

schedule switched off

autodelete off Deleting Snapshot copies might be an option when the volume can no longer be resized because the maximum configured size has been reached or when the aggregatersquos free space becomes low

LUN Options

reservation disable No preallocation of blocks for LUN

SUMMARY OF PROVISIONING METHODS

There are good reasons for using any of the provisioning methods already described however full fat for SAN environments should be avoided wherever possible because of the storage efficiency ratio Even with a 100 block usage ratio on primary data zero fat provisioning has many advantages and is the preferred method

bull The aggregatersquos free space is a global pool that can serve space for volumes This gives more flexibility than volumes with their own dedicated free space

bull For SAN volumes the block consumption can be easily monitored bull Deduplication savings go directly into the global pool of free space which is the aggregate or the

resource pool in which it belongs bull Monitoring is needed only on the aggregate level Volumes will grow on demand

Table 7) Comparison of provisioning methods

Characteristics Full Fat Low Fat Zero Fat

Space consumption 2X + Δ X + Δ X ndash N + Δ2

Space efficient

No Partially for Snapshot copies

Yes

Monitoring Optional Required on volume and aggregate level

Required on aggregate level

Notificationmitigation process required

No Optional in most cases Yes

2 N is the traditional thin provisioning impact = amount of blocks logically allocated but not used

19 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Characteristics Full Fat Low Fat Zero Fat

Pool benefitting of dedupe savings

Volume fractional reserve area

Volume free space area Aggregate free space area

Risk of an out of space condition on primary data

No No as long as autodelete is able to delete any Snapshot copies

Yes when monitoring and notification processes are missing

Typical use cases Small installations None or few storage management skills (no monitoring infrastructure)

Large database environments

Shared storage infrastructure Testdev environments Storage pools for virtualized servers

FULLLOWZERO FAT PROVISIONING WITH PROVISIONING MANAGER

NetApp Provisioning Manager focuses on fast and efficient provisioning of storage resources in the NetApp storage infrastructure It speeds up provisioning workflows and boosts capacity usage by using policy-based automation for the entire NetApp NAS and SAN infrastructure These processes are

bull Faster than manually provisioning storage bull Easier to maintain than scripts bull Instrumental in minimizing the risk of data loss resulting from misconfigured storage

Provisioning Manager applies user-defined policies to consistently select the appropriate resources for each provisioning activity This frees administrators from the headache of searching for available space to provision and allows more time for strategic issues The use of a centralized management console allows administrators to monitor the status of their provisioned storage resources Provisioning Manager can help improve business agility and capacity use shrink provisioning time and improve administrator productivity Provisioning Managerrsquos thin provisioning and deduplication capabilities provide a high level of storage efficiency from your NetApp storage investment

A GUI allows you to implement the provisioning models fulllowzero fat within Provisioning Manager See Figure 9 and Figure 10 for configuring storage efficiency in a provisioning policy for NAS and SAN Whenever storage is provisioned using this provisioning policy the settings apply automatically For more information refer to TR-3710 Operations Manager Provisioning Manager and Protection Manager Best Practices Guide

Provisioning Manager encapsulates technical details when provisioning storage and supports an easy integration with existing management tools and orchestration frameworks Policies and their use in so-called datasets and storage services allow you to exploit NetApp storage efficiency technologies without exposing a high level of technical detail to a higher level of management software

Note Provisioning Manager up to version 40 does not allow you to specify autosize and autodelete individually the policy template determines if these features are used and which options are selected In order to implement the provisioning methods outlined a customized ldquoProvisioning Scriptrdquo needs to be provided to set autosize and autodelete parameters according to the recommendations for FullLowZero methods Post provisioning scripts are standard with Provision Manager Use caution when Provisioning Manager runs conformance checks this reverts individual settings

20 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete

21 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat

FULLLOWZERO FAT PROVISIONING FOR STORAGE SERVICES

Storage services are an easy abstraction to provision storage in a utilitylike fashion A storage service describes all characteristic attributes for storage needed in a certain scenario A storage service catalog lists the available templates and allows you to provision storage with these attributes on demand Technically storage services or datasets consist of one or more resource pools a protection policy and a provisioning policy Fulllowzero fat provisioning for storage services is configured in the configuration wizard of the provisioning policy Figure 11 shows the provisioning policies closest to fulllowzero fat configurations

Figure 11) Fulllowzero fat provisioning policies for datasets and storage services

Because this wizard is able to configure the deduplication feature two policies are configured for the zero fat configurations one with deduplication and one without deduplication

22 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

HOW SHOULD A VOLUME BE SIZED

Because physical allocation of data within a zero fat-provisioned volume is done on demand theoretically the volume size can be set to a very high value that can easily keep all application data and Snapshot copies As the unallocated space in the volume is not exclusively reserved for this volume itself all other applications can benefit from the shared pool of unallocated storage However NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand The important advantage is that the commitment rate acts as a metric for data consolidation

Note The commitment rate reflects the amount of logical data consolidation This metric is suitable for deciding when data should be left for organic growth

Additionally the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers

APPLICATION RECOMMENDATIONS

Thin provisioning is most effective when applications use data that is committed to them step by step When applications preformat data the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks Because thin provisioning has no performance penalty the general recommendation is to provision with the zero fat configuration

For SAN-attached storage NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed On Windowsreg platforms this can be configured in NetApp SnapDrivereg

For Oraclereg database best practices refer to WP-7084 Storage Efficiency in an Oracle Environment

32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS

This section deals with provisioning storage for similar applications from a golden template A valid use case is a hosting provider who offers and serves predefined application services in mass quantities Instead of provisioning each application from scratch the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure

When applications are provisioned this way NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space This achieves a high degree of data consolidation and cost savings

The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance CPU and memory

There are two ways to align application data to a NetApp shared storage infrastructure

bull Volume-centric storage layout bull Dedupe-centric storage layout

Depending on the data lifetime suitability for deduplication consistency and tool constraints one way of aligning application data is more appropriate than the other

In both variants the storage of the application template can be provisioned as either full low or zero fat The cloning procedure inherits the attributes of the parent volume To create space-efficient clones the space guarantee must be set to none

23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

CONSEQUENCES FOR MONITORING

When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well

VOLUME-CENTRIC STORAGE LAYOUT

In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies

In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions

bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time

bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data

A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN

Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate

Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases

bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts

24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance 1

Deduplication Block Sharing FlexVol

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance n

FlexClone Block Sharing

Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow

Best Practice

A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance

Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided

bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning

bull Preformatting data

25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

DEDUPE-CENTRIC STORAGE LAYOUT

In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication

This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity

This storage layout provides the following storage efficiency advantages in a short- and long-term perspective

bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns

bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data

In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation

TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment

Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates

26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically

Template

Instance 1

Instance 2

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled

Best Practice

This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication

Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume

We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties

27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION

Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach

The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left

It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to

bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a

limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the

migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval

Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate

Settled

Aggregate

Nomad Nomad

To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor

28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

SLA-BASED ASSESSMENT FOR SETTLEDNOMAD

The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances

We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment

Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances

Figure 15) Alignment by technical impact (sorted by negative impact in descending order)

Settled Nomad

Eg All FC

Instance Inst1 InstN

Medium LowInside SLANeg Impact HighOutside SLA

Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest

Figure 16) Alignment by business impact (sorted by negative impact in descending order)

Settled Nomad

Instance

Neg Impact $$ $

Semi-Settled Nomad

PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION

Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers

29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion

NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations

Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility

ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE

While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted

SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING

In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting

ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS

Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore

In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller

30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise

We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives

Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases

bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously

provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on

These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future

SITUATIONS PUTTING SLA FULFILLMENT AT RISK

Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy

The following list summarize situations that are critical for service delivery

bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time

bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use

bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react

bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this

looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a

ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk

Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are

contained

31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups

41 PHASES AND TRANSITIONS

This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness

bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase

bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase

bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase

42 MONITORING

NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors

Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set

THRESHOLDS

Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation

Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page

32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 17) Operations Manager screen to configure thresholds on operational metrics

For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation

Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data

The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations

bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge

bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification

bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor

bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification

Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state

bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached

bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification

bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality

33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

TRENDING

Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly

Figure 18) Trending of data growth and days-to-full prediction in Operations Manager

Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting

The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates

On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary

Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object

34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME

For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog

MONITORING STORAGE EFFICIENCY RETURNS

NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard

Figure 19) Storage efficiency dashboard in Operations Manager

35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

43 NOTIFICATION

Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low

After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process

Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure

Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP

NOTIFY BY E-MAIL

An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42

NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager

NOTIFY BY SNMP

Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested

36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 20) Configuring an alarm based on the threshold aggregate almost full

Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there

NOTIFY BY SCRIPT

Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line

dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full

37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

44 MITIGATE STORAGE USE

Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor

Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary

When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness

MITIGATION ACTIVITIES FOR AGGREGATES

Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow

1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives

2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero

3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space

4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage

controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability

6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes

7 If none of the listed activities can be used the application must be stopped to achieve a consistent state

The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online

38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Table 8) Mitigation alternatives to control use within aggregates

No Mitigation Activity Repeatability SLA Impact Preparation Time

Time to Show Effect

1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits

None HW procurement

Immediate (+rebalancing)

Data ONTAP 8 High limits

2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate

3 Shrink other volumes in the aggregate if they have enough free space

One time Low None Immediate

4 Run deduplication and shrink volumes Repeatable Low

Time to execute dedupe

Immediate

5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time

6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh

Next planned downtime window

Minutes Volume switch-over time

7 Prevent application data loss and stop the application then migrate (offline)

Repeatable Lowndashhigh Coordinate with app owner

Minutes Migration time

MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS

Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity

Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)

Table 9) Mitigation activities for resource tightness within volumes

No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect

1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate

2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate

3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate

4 Activate FAS deduplication for the volume (requires proper space guarantees) One time

Lowpossible performance impact

Wait for schedule Hours

5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate

Repeatable High Next planned downtime window

Minutes Volume migration time

6 Stop application and migrate data Repeatable High Coordinate wapp owner

Minutes Migration time

39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations

The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum

51 SAMPLE SETTING 1 REAL-LIFE SETTING

This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative

Figure 21) Storage to enable organic data growth between planned downtime windows

Data Data Growth

Planned Downtime Window

Planned Downtime Window

Months Time

Note Several months might fall between planned downtime windows to perform major mitigation alternatives

The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely

Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate

40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth

An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are

bull First metric Aggregate capacity used bull Second metric Aggregate space committed

Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space

Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space

Data Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used

Aggregate Space Committed

0ndash50 gt 65

0ndash110 gt 120

Provisioning New Storage Y

Capacity Assessment Adapt Thresholds

Mitigate

Y Y

Y

Provisioning New Storage Y

Assess Capacity Y

41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

52 SAMPLE SETTING 2 SETTLEDNOMAD

This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online

Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months

Settled

Detecting the Need to Act

Effect of Mitigation (eg migration)

Hours Time

N NN N N

In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on

the individual situation and is calculated against 100

The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases

Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative

Detection Threshold Notify Mitigation

gt 70 Storage operations Stop provisioning of storage

gt 85 Storage operations Stop extending provisioned storage

gt 90 Storage operations Relax resource situation and migrate nomad

42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used

Settled Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used 0ndash70 70ndash85 gt 90

Provisioning New Storage Y

Extending Already Provisioned Storage

Relax UtilizationmdashNetApp Data Motion a Nomad

Y Y

Y

N N N

You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors

43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe

Elapsed Time

Capacity

1 Month 3 Months

Committed Capacity

Overall Trend

Last 3-Month Trend

Capacity Used

1 2 3

As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows

1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window

2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment

3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)

44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not

exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect

b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives

c Determine the growth rate Operations Manager provides help in determining the trend of data growth

d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past

To provision storage following these steps

1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes

2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler

entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there

is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller

c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely

d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off

Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 10: Lun Provision

10 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

A set of questions pertaining to the lifetime of a service instance and its storage arises It starts with provisioning storage in a NetApp shared storage infrastructure detection and monitoring of situations endangering the level of a service necessary response procedures and promoting a continuous and smooth delivery of services

The questions are structured around a cycle that starts at provisioning storage and finishes at deprovisioning storage Figure 6 shows important questions regarding storage efficiency from an operational point of view

bull Provisioning deals with the provisioning of storage In this document provisioning models are shown to achieve a high level of storage consolidation while preserving operational flexibility For individual applications NetApp provides a rich library of technical reports on how to provision best

bull Monitoring deals with defining the goal of the monitoring process and which metrics to use to decide when to stop (for example the provisioning of storage) Relevant metrics provided by NetApp Operations Manager are described

bull Notifying deals with how to notify people in charge of when to perform certain actions The notification mechanisms within NetApp Operations Manager are described to deliver information in case of certain events

bull Mitigation deals with mechanisms to prevent uncontrolled utilization affecting operational flexibility and service fulfillment

Figure 6) Questions regarding storage efficiency from an operational point of view

bull How to provision best for storage efficiencysect Provisioning Modelssect NetApp Data Motion

awarenesssect From scratch or

templateclonebull Where to provision tobull Which SLAbull What are the defaults

Provision

Monitor

Notification

Mitigatebull What is criticalsect When to stop provisioningsect When to stop extendingsect When to relax tightnesssect How to detect

bull Monitoringsect Toolssect What to monitor

bull Who is in charge to reactbull How to notify

bull Available optionsbull Implications on SLAs bull When to act

Before discussing the details of this cycle it is important to understand the NetApp technologies that achieve storage efficiency and flexibility and to understand their relevance in the provisioning and operational phases

11 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY

The NetApp technologies provided in Table 1 are characterized by how they bring the most significant advantage and value For example FlexClonereg technology provides significant time and space advantages while provisioning but the space advantage might be reduced over time In contrast deduplication technology can achieve space savings over the entire storage lifetime

Table 1) NetApp technologies for storage efficiency and flexibility

NetApp Technology Benefit During Provisioning

During Operation

FlexClone Instantly creates thin provisioned and space-efficient writable clones X

FlexVolreg Implements thin provisioning and consumes only the needed space rather than the requested space X X

Deduplication Increases data consolidation by detecting and optimizing repeating patterns in primary and secondary storage X X

NetApp Data Motion Provides flexibility for management and optimal loadcapacity rebalancing in growing cloud environments without downtime

X

Aggregate Extensibility in Data ONTAPreg

Data ONTAP is the foundation for all features listed in this table and provides flexibility in handling physical resources It allows extending physical aggregates during operation

X X

Furthermore NetApp RAID-DPreg SATA and NetApp Flash Cache (formerly PAM II) are technologies that help to reduce the total cost of storage tremendously It is assumed that these technologies are deployed according to the requirements of the use case

NETAPP SHARED STORAGE INFRASTRUCTURE

To implement the practices outlined in this document some prerequisites must be met We assume a NetApp shared storage infrastructure implemented using large aggregates This acts as a utility for delivering storage in a flexible manner for applications with different needs It scales with the demands and serves a variety of different service levels at the same time NetApp Operations Manager monitors the NetApp shared storage infrastructure This software acts as a central management station and consolidates information about the current status of all NetApp storage controllers Based on this information Operations Manager indicates the necessity to change the phases and behavior in the data center

The NetApp shared storage infrastructure provides different ways for clients to consume its resources It can provide a traditional view where storage resources are located at a specific controller Using NetApp Provisioning Manager the infrastructure can also provide a service-oriented resource view that abstracts resource consumption and management from their physical controllers The abstractions of a storage service catalog resource pools and datasets provide easy manageability in the face of massive scale If multi-tenancy is not required then this is the abstraction of choice

Supported by the NetApp technologies MultiStorereg (vFilertrade) and NetApp Data Motion storage can be provided in a utilitylike fashion independent of physical hardware This makes possible high operational flexibility in the data center and allows building virtualized environments for multiple tenants with competing interests

12 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

3 PROVISIONING Some features such as data deduplication can be turned on or off at any time However to achieve the maximum consolidation and storage efficiency more strategies must be considered during the data-provisioning phase

In addition provisioning should take the flexibility of storage (for example the migrate ability) into account This allows administrators to easily move data off aggregates approaching capacity without downtime This is also an important aspect when planning to deliver services 24x7

Thus design goals for storage layouts are storage efficiency and operational flexibility In the following sections we discuss three orthogonal dimensions of provisioning storage Two of them focus on achieving data consolidation The third one focuses on achieving operational flexibility All dimensions can be combined independently Note that the achievable level of consolidation depends on the applications and its data

TR-3827 If You Are Doing This Then Your Storage Could Be Underutilized provides further understanding of storage efficiency and operational flexibility

31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING

This section deals with the way data is provisioned and the consequences for storage efficiency We recommend applying the so-called zero fat configurations

We consider the storage setup for a single application instance The presented configurations can be applied while provisioning storage from scratch and to already provisioned storage When the technical dimensions of storage provisioning are categorized in primary data and its Snapshottrade copies space there are four theoretical combinations for both network-attached storage (NAS) and storage area network (SAN) environments In practical applications only two variants are relevant to NAS and three variants are relevant to SAN storage

bull Full fat bull Low fat bull Zero fat

According to NetApp best practices we do not mix block and file data within a single volume which allows us to consider NAS and SAN environments separately

NAS

For NAS two options are recommended full fat and zero fat

bull Full fat The primary data and Snapshot copy space are preallocated bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves

the best ratio of storage efficiency when provisioning applications from scratch

13 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 7) Provisioning model for NAS storage from scratch Technically only two out of four combinations are possible

Primary Data (Files amp Directory) Space Allocation

Fat Thin

Full Fat Option No Option

No Option Zero Fat Option

Fat

Thin

Snapshot Copy Space

Allocation

Note Full fat is characterized slightly different in NAS and SAN due to their technical properties

FULL FAT PROVISIONING

Full fat provisioning NAS is the traditional (default) way to implement NFSCIFS shares Volumes in a full fat configuration are characterized as follows

bull Volumes are created with space guarantee bull The size of the volume follows the formula X + Δ

X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data

bull Because space used for Snapshot copies might grow unexpectedly the autosize function can be used to make space available when reaching a certain volume threshold This would also happen when the space reserved for user data gets low

bull Space reservation for Snapshot copies is used to hide the capacity used for Snapshot copies from the consumers (NAS clients)

bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Normally using autodelete is not recommended in NAS environments Keeping a certain number of

Snapshot copies for file versioningrestores might be part of the SLAs defined for file services

Note Deleting snapshots may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision

Table 2) Full fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default

autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit

14 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

Volume Snapshot Options

reserve yes Value depends on number of Snapshot copies and change rate within the volume

schedule switched on Automatic Snapshot technology schedules

autodelete off Deleting Snapshot copies is not recommended in most NAS environments

ZERO FAT PROVISIONING

The zero fat method is the most efficient way to provision NAS volumes

bull Volumes are created without space guarantee bull The size of the volume still follows the formula X + Δ

X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data Sizing the volume defines a container with a virtual size for the consumers NAS users are familiar with fixed-sized file shares

bull Space used for Snapshot copies can grow unexpectedly You can use the autosize function to make space available when reaching a certain volume threshold You can also use the autosize function when the space reserved for user data gets low

bull Space reserved for Snapshot copies is used to hide from the consumers (NAS clients) the capacity taken up by Snapshot copies

bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Using autodelete is normally not recommended in NAS environments Keeping a certain amount of

space for Snapshot copies for file versioningrestores is part of the SLAs defined for file services

Note Deleting Snapshot copies may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision

Table 3) Zero fat provisioning

Option Recommended Value Notes

Volume Options

guarantee none

fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default

autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit

15 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first - Autodelete is not recommended in most environments

Volume Snapshot Options

reserve yesno The value depends on the number of Snapshot copies and the change rate within the volume Displaying only the committed usable space using SLA is the preferred way to provision NAS storage However there might be situations in which the Snapshot reserve area is omitted (no)

schedule switched on Automatic Snapshot technology schedules

autodelete off Deleting Snapshot copies is not recommended in most NAS environments

SAN

For SAN we consider three options

bull Full fat Both primary data and its Snapshot copy space are preallocated bull Low fat The primary data is preallocated The Snapshot copy space is allocated on demand bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves

the best ratio of storage efficiency when provisioning applications from scratch

Figure 8) Provisioning model for SAN storage from scratch

Primary Data (LUN) Space Allocation

Fat Thin

Full Fat Option No Option

Low Fat Option Zero Fat Option

Fat

Thin

Snapshot Copy Space

Allocation

FULL FAT PROVISIONING

This method can be treated as the historical way of provisioning block storage with Data ONTAP

bull Volumes are created with space guarantee bull A fractional (overwrite) reserve is used to guarantee that the primary data can be overwritten

completely with Snapshot copies in place If this space is not available Snapshot copy creation will fail

bull The size of the volume follows the formula 2X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data

16 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The enhancements to the volume autosize capabilities (such as volume size-dependent thresholds) and the robustness of the Snapshot feature autodelete implementation have made full fat provisioning more or less obsolete As of today the default settings for creating volumesLUNs in Data ONTAP still apply to these settings See Provisioning from Scratch Full Fat to Zero Fat Provisioning for a discussion of using tools such as Provisioning Manager

Table 4) Full fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 100 Even technically possible a fractional reserve below 100 incorporates a potential risk to run out of Snapshot copy overwrite space This situation should be avoided

autosize off Autosize could be used as an option to create free space needed for Snapshot copy creation

Volume Snapshot Options

reserve 0

schedule switched off

autodelete off

LUN Options

reservation enable

LOW FAT PROVISIONING

With low fat provisioning we use a more space-efficient way to provision volumes

bull Volumes are created with space guarantee bull LUNs are created with space guarantee as well This setup does not benefit from unused blocks with

a LUN (During the lifetime of a LUN the amount of free unused blocks typically decreases Without space reclamation techniques allocated blocks on the storage system stay allocated)

bull The size of the volume follows the formula X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data

bull Because space used for Snapshot copies might grow unexpectedly the autosize and autodelete policies are used to make space available when reaching a preset volume threshold

Table 5) Low fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 0 Snapshot space is controlled by autodelete and autosize options

autosize on Turn autosize on

17 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first volume_grow Increasing the size of the volume does not destroy any data or information There is no reason not to increase the size of the volume It can be reverted afterward if the volume free space increases again There might be configurations where automatic volume growth is not desired

Volume Snapshot Options

reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration of Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)

schedule switched off

autodelete on There might be Snapshot copies that are needed to fulfill certain SLAs such as backup SLAs Setting this policy needs to be negotiated with the business requirements In the worst case scenario deleting Snapshot copies is not an option

autodelete options volume oldest_first There is a precedent for Snapshot copies being a candidate for deletion oldest_first is the current default

LUN Options

reservation enable Reserves space for the LUN during creation

ZERO FAT PROVISIONING

Full and low fat provisioning use fully allocated volumes and LUNs Per-default space allocation happens within the boundaries of the LUN and the volume Zero fat follows a 100 ldquoallocate on demandrdquo concept

bull Volumes are created without space guarantee bull LUNs are created without space guarantee bull The size of the volume follows the formula X ndash N + Δ

X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data N is the amount of unused blocks within a given LUN

Table 6) Zero fat provisioning

Option Recommended Value Notes

Volume Options

guarantee none No space reservation for volume at all

fractional_reserve 0 With Data ONTAP 733 fractional_reserve can be modified even for volumes without a space guarantee of type volume Prior to Data ONTAP 733 the value was fixed at 100

autosize on Turn autosize on

18 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first volume_grow

Volume Snapshot Options

reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)

schedule switched off

autodelete off Deleting Snapshot copies might be an option when the volume can no longer be resized because the maximum configured size has been reached or when the aggregatersquos free space becomes low

LUN Options

reservation disable No preallocation of blocks for LUN

SUMMARY OF PROVISIONING METHODS

There are good reasons for using any of the provisioning methods already described however full fat for SAN environments should be avoided wherever possible because of the storage efficiency ratio Even with a 100 block usage ratio on primary data zero fat provisioning has many advantages and is the preferred method

bull The aggregatersquos free space is a global pool that can serve space for volumes This gives more flexibility than volumes with their own dedicated free space

bull For SAN volumes the block consumption can be easily monitored bull Deduplication savings go directly into the global pool of free space which is the aggregate or the

resource pool in which it belongs bull Monitoring is needed only on the aggregate level Volumes will grow on demand

Table 7) Comparison of provisioning methods

Characteristics Full Fat Low Fat Zero Fat

Space consumption 2X + Δ X + Δ X ndash N + Δ2

Space efficient

No Partially for Snapshot copies

Yes

Monitoring Optional Required on volume and aggregate level

Required on aggregate level

Notificationmitigation process required

No Optional in most cases Yes

2 N is the traditional thin provisioning impact = amount of blocks logically allocated but not used

19 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Characteristics Full Fat Low Fat Zero Fat

Pool benefitting of dedupe savings

Volume fractional reserve area

Volume free space area Aggregate free space area

Risk of an out of space condition on primary data

No No as long as autodelete is able to delete any Snapshot copies

Yes when monitoring and notification processes are missing

Typical use cases Small installations None or few storage management skills (no monitoring infrastructure)

Large database environments

Shared storage infrastructure Testdev environments Storage pools for virtualized servers

FULLLOWZERO FAT PROVISIONING WITH PROVISIONING MANAGER

NetApp Provisioning Manager focuses on fast and efficient provisioning of storage resources in the NetApp storage infrastructure It speeds up provisioning workflows and boosts capacity usage by using policy-based automation for the entire NetApp NAS and SAN infrastructure These processes are

bull Faster than manually provisioning storage bull Easier to maintain than scripts bull Instrumental in minimizing the risk of data loss resulting from misconfigured storage

Provisioning Manager applies user-defined policies to consistently select the appropriate resources for each provisioning activity This frees administrators from the headache of searching for available space to provision and allows more time for strategic issues The use of a centralized management console allows administrators to monitor the status of their provisioned storage resources Provisioning Manager can help improve business agility and capacity use shrink provisioning time and improve administrator productivity Provisioning Managerrsquos thin provisioning and deduplication capabilities provide a high level of storage efficiency from your NetApp storage investment

A GUI allows you to implement the provisioning models fulllowzero fat within Provisioning Manager See Figure 9 and Figure 10 for configuring storage efficiency in a provisioning policy for NAS and SAN Whenever storage is provisioned using this provisioning policy the settings apply automatically For more information refer to TR-3710 Operations Manager Provisioning Manager and Protection Manager Best Practices Guide

Provisioning Manager encapsulates technical details when provisioning storage and supports an easy integration with existing management tools and orchestration frameworks Policies and their use in so-called datasets and storage services allow you to exploit NetApp storage efficiency technologies without exposing a high level of technical detail to a higher level of management software

Note Provisioning Manager up to version 40 does not allow you to specify autosize and autodelete individually the policy template determines if these features are used and which options are selected In order to implement the provisioning methods outlined a customized ldquoProvisioning Scriptrdquo needs to be provided to set autosize and autodelete parameters according to the recommendations for FullLowZero methods Post provisioning scripts are standard with Provision Manager Use caution when Provisioning Manager runs conformance checks this reverts individual settings

20 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete

21 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat

FULLLOWZERO FAT PROVISIONING FOR STORAGE SERVICES

Storage services are an easy abstraction to provision storage in a utilitylike fashion A storage service describes all characteristic attributes for storage needed in a certain scenario A storage service catalog lists the available templates and allows you to provision storage with these attributes on demand Technically storage services or datasets consist of one or more resource pools a protection policy and a provisioning policy Fulllowzero fat provisioning for storage services is configured in the configuration wizard of the provisioning policy Figure 11 shows the provisioning policies closest to fulllowzero fat configurations

Figure 11) Fulllowzero fat provisioning policies for datasets and storage services

Because this wizard is able to configure the deduplication feature two policies are configured for the zero fat configurations one with deduplication and one without deduplication

22 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

HOW SHOULD A VOLUME BE SIZED

Because physical allocation of data within a zero fat-provisioned volume is done on demand theoretically the volume size can be set to a very high value that can easily keep all application data and Snapshot copies As the unallocated space in the volume is not exclusively reserved for this volume itself all other applications can benefit from the shared pool of unallocated storage However NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand The important advantage is that the commitment rate acts as a metric for data consolidation

Note The commitment rate reflects the amount of logical data consolidation This metric is suitable for deciding when data should be left for organic growth

Additionally the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers

APPLICATION RECOMMENDATIONS

Thin provisioning is most effective when applications use data that is committed to them step by step When applications preformat data the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks Because thin provisioning has no performance penalty the general recommendation is to provision with the zero fat configuration

For SAN-attached storage NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed On Windowsreg platforms this can be configured in NetApp SnapDrivereg

For Oraclereg database best practices refer to WP-7084 Storage Efficiency in an Oracle Environment

32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS

This section deals with provisioning storage for similar applications from a golden template A valid use case is a hosting provider who offers and serves predefined application services in mass quantities Instead of provisioning each application from scratch the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure

When applications are provisioned this way NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space This achieves a high degree of data consolidation and cost savings

The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance CPU and memory

There are two ways to align application data to a NetApp shared storage infrastructure

bull Volume-centric storage layout bull Dedupe-centric storage layout

Depending on the data lifetime suitability for deduplication consistency and tool constraints one way of aligning application data is more appropriate than the other

In both variants the storage of the application template can be provisioned as either full low or zero fat The cloning procedure inherits the attributes of the parent volume To create space-efficient clones the space guarantee must be set to none

23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

CONSEQUENCES FOR MONITORING

When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well

VOLUME-CENTRIC STORAGE LAYOUT

In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies

In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions

bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time

bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data

A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN

Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate

Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases

bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts

24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance 1

Deduplication Block Sharing FlexVol

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance n

FlexClone Block Sharing

Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow

Best Practice

A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance

Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided

bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning

bull Preformatting data

25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

DEDUPE-CENTRIC STORAGE LAYOUT

In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication

This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity

This storage layout provides the following storage efficiency advantages in a short- and long-term perspective

bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns

bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data

In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation

TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment

Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates

26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically

Template

Instance 1

Instance 2

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled

Best Practice

This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication

Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume

We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties

27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION

Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach

The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left

It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to

bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a

limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the

migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval

Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate

Settled

Aggregate

Nomad Nomad

To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor

28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

SLA-BASED ASSESSMENT FOR SETTLEDNOMAD

The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances

We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment

Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances

Figure 15) Alignment by technical impact (sorted by negative impact in descending order)

Settled Nomad

Eg All FC

Instance Inst1 InstN

Medium LowInside SLANeg Impact HighOutside SLA

Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest

Figure 16) Alignment by business impact (sorted by negative impact in descending order)

Settled Nomad

Instance

Neg Impact $$ $

Semi-Settled Nomad

PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION

Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers

29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion

NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations

Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility

ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE

While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted

SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING

In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting

ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS

Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore

In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller

30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise

We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives

Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases

bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously

provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on

These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future

SITUATIONS PUTTING SLA FULFILLMENT AT RISK

Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy

The following list summarize situations that are critical for service delivery

bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time

bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use

bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react

bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this

looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a

ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk

Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are

contained

31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups

41 PHASES AND TRANSITIONS

This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness

bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase

bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase

bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase

42 MONITORING

NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors

Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set

THRESHOLDS

Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation

Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page

32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 17) Operations Manager screen to configure thresholds on operational metrics

For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation

Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data

The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations

bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge

bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification

bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor

bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification

Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state

bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached

bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification

bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality

33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

TRENDING

Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly

Figure 18) Trending of data growth and days-to-full prediction in Operations Manager

Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting

The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates

On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary

Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object

34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME

For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog

MONITORING STORAGE EFFICIENCY RETURNS

NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard

Figure 19) Storage efficiency dashboard in Operations Manager

35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

43 NOTIFICATION

Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low

After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process

Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure

Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP

NOTIFY BY E-MAIL

An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42

NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager

NOTIFY BY SNMP

Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested

36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 20) Configuring an alarm based on the threshold aggregate almost full

Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there

NOTIFY BY SCRIPT

Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line

dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full

37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

44 MITIGATE STORAGE USE

Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor

Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary

When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness

MITIGATION ACTIVITIES FOR AGGREGATES

Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow

1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives

2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero

3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space

4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage

controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability

6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes

7 If none of the listed activities can be used the application must be stopped to achieve a consistent state

The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online

38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Table 8) Mitigation alternatives to control use within aggregates

No Mitigation Activity Repeatability SLA Impact Preparation Time

Time to Show Effect

1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits

None HW procurement

Immediate (+rebalancing)

Data ONTAP 8 High limits

2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate

3 Shrink other volumes in the aggregate if they have enough free space

One time Low None Immediate

4 Run deduplication and shrink volumes Repeatable Low

Time to execute dedupe

Immediate

5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time

6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh

Next planned downtime window

Minutes Volume switch-over time

7 Prevent application data loss and stop the application then migrate (offline)

Repeatable Lowndashhigh Coordinate with app owner

Minutes Migration time

MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS

Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity

Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)

Table 9) Mitigation activities for resource tightness within volumes

No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect

1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate

2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate

3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate

4 Activate FAS deduplication for the volume (requires proper space guarantees) One time

Lowpossible performance impact

Wait for schedule Hours

5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate

Repeatable High Next planned downtime window

Minutes Volume migration time

6 Stop application and migrate data Repeatable High Coordinate wapp owner

Minutes Migration time

39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations

The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum

51 SAMPLE SETTING 1 REAL-LIFE SETTING

This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative

Figure 21) Storage to enable organic data growth between planned downtime windows

Data Data Growth

Planned Downtime Window

Planned Downtime Window

Months Time

Note Several months might fall between planned downtime windows to perform major mitigation alternatives

The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely

Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate

40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth

An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are

bull First metric Aggregate capacity used bull Second metric Aggregate space committed

Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space

Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space

Data Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used

Aggregate Space Committed

0ndash50 gt 65

0ndash110 gt 120

Provisioning New Storage Y

Capacity Assessment Adapt Thresholds

Mitigate

Y Y

Y

Provisioning New Storage Y

Assess Capacity Y

41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

52 SAMPLE SETTING 2 SETTLEDNOMAD

This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online

Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months

Settled

Detecting the Need to Act

Effect of Mitigation (eg migration)

Hours Time

N NN N N

In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on

the individual situation and is calculated against 100

The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases

Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative

Detection Threshold Notify Mitigation

gt 70 Storage operations Stop provisioning of storage

gt 85 Storage operations Stop extending provisioned storage

gt 90 Storage operations Relax resource situation and migrate nomad

42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used

Settled Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used 0ndash70 70ndash85 gt 90

Provisioning New Storage Y

Extending Already Provisioned Storage

Relax UtilizationmdashNetApp Data Motion a Nomad

Y Y

Y

N N N

You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors

43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe

Elapsed Time

Capacity

1 Month 3 Months

Committed Capacity

Overall Trend

Last 3-Month Trend

Capacity Used

1 2 3

As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows

1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window

2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment

3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)

44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not

exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect

b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives

c Determine the growth rate Operations Manager provides help in determining the trend of data growth

d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past

To provision storage following these steps

1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes

2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler

entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there

is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller

c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely

d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off

Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 11: Lun Provision

11 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY

The NetApp technologies provided in Table 1 are characterized by how they bring the most significant advantage and value For example FlexClonereg technology provides significant time and space advantages while provisioning but the space advantage might be reduced over time In contrast deduplication technology can achieve space savings over the entire storage lifetime

Table 1) NetApp technologies for storage efficiency and flexibility

NetApp Technology Benefit During Provisioning

During Operation

FlexClone Instantly creates thin provisioned and space-efficient writable clones X

FlexVolreg Implements thin provisioning and consumes only the needed space rather than the requested space X X

Deduplication Increases data consolidation by detecting and optimizing repeating patterns in primary and secondary storage X X

NetApp Data Motion Provides flexibility for management and optimal loadcapacity rebalancing in growing cloud environments without downtime

X

Aggregate Extensibility in Data ONTAPreg

Data ONTAP is the foundation for all features listed in this table and provides flexibility in handling physical resources It allows extending physical aggregates during operation

X X

Furthermore NetApp RAID-DPreg SATA and NetApp Flash Cache (formerly PAM II) are technologies that help to reduce the total cost of storage tremendously It is assumed that these technologies are deployed according to the requirements of the use case

NETAPP SHARED STORAGE INFRASTRUCTURE

To implement the practices outlined in this document some prerequisites must be met We assume a NetApp shared storage infrastructure implemented using large aggregates This acts as a utility for delivering storage in a flexible manner for applications with different needs It scales with the demands and serves a variety of different service levels at the same time NetApp Operations Manager monitors the NetApp shared storage infrastructure This software acts as a central management station and consolidates information about the current status of all NetApp storage controllers Based on this information Operations Manager indicates the necessity to change the phases and behavior in the data center

The NetApp shared storage infrastructure provides different ways for clients to consume its resources It can provide a traditional view where storage resources are located at a specific controller Using NetApp Provisioning Manager the infrastructure can also provide a service-oriented resource view that abstracts resource consumption and management from their physical controllers The abstractions of a storage service catalog resource pools and datasets provide easy manageability in the face of massive scale If multi-tenancy is not required then this is the abstraction of choice

Supported by the NetApp technologies MultiStorereg (vFilertrade) and NetApp Data Motion storage can be provided in a utilitylike fashion independent of physical hardware This makes possible high operational flexibility in the data center and allows building virtualized environments for multiple tenants with competing interests

12 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

3 PROVISIONING Some features such as data deduplication can be turned on or off at any time However to achieve the maximum consolidation and storage efficiency more strategies must be considered during the data-provisioning phase

In addition provisioning should take the flexibility of storage (for example the migrate ability) into account This allows administrators to easily move data off aggregates approaching capacity without downtime This is also an important aspect when planning to deliver services 24x7

Thus design goals for storage layouts are storage efficiency and operational flexibility In the following sections we discuss three orthogonal dimensions of provisioning storage Two of them focus on achieving data consolidation The third one focuses on achieving operational flexibility All dimensions can be combined independently Note that the achievable level of consolidation depends on the applications and its data

TR-3827 If You Are Doing This Then Your Storage Could Be Underutilized provides further understanding of storage efficiency and operational flexibility

31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING

This section deals with the way data is provisioned and the consequences for storage efficiency We recommend applying the so-called zero fat configurations

We consider the storage setup for a single application instance The presented configurations can be applied while provisioning storage from scratch and to already provisioned storage When the technical dimensions of storage provisioning are categorized in primary data and its Snapshottrade copies space there are four theoretical combinations for both network-attached storage (NAS) and storage area network (SAN) environments In practical applications only two variants are relevant to NAS and three variants are relevant to SAN storage

bull Full fat bull Low fat bull Zero fat

According to NetApp best practices we do not mix block and file data within a single volume which allows us to consider NAS and SAN environments separately

NAS

For NAS two options are recommended full fat and zero fat

bull Full fat The primary data and Snapshot copy space are preallocated bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves

the best ratio of storage efficiency when provisioning applications from scratch

13 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 7) Provisioning model for NAS storage from scratch Technically only two out of four combinations are possible

Primary Data (Files amp Directory) Space Allocation

Fat Thin

Full Fat Option No Option

No Option Zero Fat Option

Fat

Thin

Snapshot Copy Space

Allocation

Note Full fat is characterized slightly different in NAS and SAN due to their technical properties

FULL FAT PROVISIONING

Full fat provisioning NAS is the traditional (default) way to implement NFSCIFS shares Volumes in a full fat configuration are characterized as follows

bull Volumes are created with space guarantee bull The size of the volume follows the formula X + Δ

X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data

bull Because space used for Snapshot copies might grow unexpectedly the autosize function can be used to make space available when reaching a certain volume threshold This would also happen when the space reserved for user data gets low

bull Space reservation for Snapshot copies is used to hide the capacity used for Snapshot copies from the consumers (NAS clients)

bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Normally using autodelete is not recommended in NAS environments Keeping a certain number of

Snapshot copies for file versioningrestores might be part of the SLAs defined for file services

Note Deleting snapshots may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision

Table 2) Full fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default

autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit

14 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

Volume Snapshot Options

reserve yes Value depends on number of Snapshot copies and change rate within the volume

schedule switched on Automatic Snapshot technology schedules

autodelete off Deleting Snapshot copies is not recommended in most NAS environments

ZERO FAT PROVISIONING

The zero fat method is the most efficient way to provision NAS volumes

bull Volumes are created without space guarantee bull The size of the volume still follows the formula X + Δ

X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data Sizing the volume defines a container with a virtual size for the consumers NAS users are familiar with fixed-sized file shares

bull Space used for Snapshot copies can grow unexpectedly You can use the autosize function to make space available when reaching a certain volume threshold You can also use the autosize function when the space reserved for user data gets low

bull Space reserved for Snapshot copies is used to hide from the consumers (NAS clients) the capacity taken up by Snapshot copies

bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Using autodelete is normally not recommended in NAS environments Keeping a certain amount of

space for Snapshot copies for file versioningrestores is part of the SLAs defined for file services

Note Deleting Snapshot copies may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision

Table 3) Zero fat provisioning

Option Recommended Value Notes

Volume Options

guarantee none

fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default

autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit

15 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first - Autodelete is not recommended in most environments

Volume Snapshot Options

reserve yesno The value depends on the number of Snapshot copies and the change rate within the volume Displaying only the committed usable space using SLA is the preferred way to provision NAS storage However there might be situations in which the Snapshot reserve area is omitted (no)

schedule switched on Automatic Snapshot technology schedules

autodelete off Deleting Snapshot copies is not recommended in most NAS environments

SAN

For SAN we consider three options

bull Full fat Both primary data and its Snapshot copy space are preallocated bull Low fat The primary data is preallocated The Snapshot copy space is allocated on demand bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves

the best ratio of storage efficiency when provisioning applications from scratch

Figure 8) Provisioning model for SAN storage from scratch

Primary Data (LUN) Space Allocation

Fat Thin

Full Fat Option No Option

Low Fat Option Zero Fat Option

Fat

Thin

Snapshot Copy Space

Allocation

FULL FAT PROVISIONING

This method can be treated as the historical way of provisioning block storage with Data ONTAP

bull Volumes are created with space guarantee bull A fractional (overwrite) reserve is used to guarantee that the primary data can be overwritten

completely with Snapshot copies in place If this space is not available Snapshot copy creation will fail

bull The size of the volume follows the formula 2X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data

16 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The enhancements to the volume autosize capabilities (such as volume size-dependent thresholds) and the robustness of the Snapshot feature autodelete implementation have made full fat provisioning more or less obsolete As of today the default settings for creating volumesLUNs in Data ONTAP still apply to these settings See Provisioning from Scratch Full Fat to Zero Fat Provisioning for a discussion of using tools such as Provisioning Manager

Table 4) Full fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 100 Even technically possible a fractional reserve below 100 incorporates a potential risk to run out of Snapshot copy overwrite space This situation should be avoided

autosize off Autosize could be used as an option to create free space needed for Snapshot copy creation

Volume Snapshot Options

reserve 0

schedule switched off

autodelete off

LUN Options

reservation enable

LOW FAT PROVISIONING

With low fat provisioning we use a more space-efficient way to provision volumes

bull Volumes are created with space guarantee bull LUNs are created with space guarantee as well This setup does not benefit from unused blocks with

a LUN (During the lifetime of a LUN the amount of free unused blocks typically decreases Without space reclamation techniques allocated blocks on the storage system stay allocated)

bull The size of the volume follows the formula X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data

bull Because space used for Snapshot copies might grow unexpectedly the autosize and autodelete policies are used to make space available when reaching a preset volume threshold

Table 5) Low fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 0 Snapshot space is controlled by autodelete and autosize options

autosize on Turn autosize on

17 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first volume_grow Increasing the size of the volume does not destroy any data or information There is no reason not to increase the size of the volume It can be reverted afterward if the volume free space increases again There might be configurations where automatic volume growth is not desired

Volume Snapshot Options

reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration of Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)

schedule switched off

autodelete on There might be Snapshot copies that are needed to fulfill certain SLAs such as backup SLAs Setting this policy needs to be negotiated with the business requirements In the worst case scenario deleting Snapshot copies is not an option

autodelete options volume oldest_first There is a precedent for Snapshot copies being a candidate for deletion oldest_first is the current default

LUN Options

reservation enable Reserves space for the LUN during creation

ZERO FAT PROVISIONING

Full and low fat provisioning use fully allocated volumes and LUNs Per-default space allocation happens within the boundaries of the LUN and the volume Zero fat follows a 100 ldquoallocate on demandrdquo concept

bull Volumes are created without space guarantee bull LUNs are created without space guarantee bull The size of the volume follows the formula X ndash N + Δ

X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data N is the amount of unused blocks within a given LUN

Table 6) Zero fat provisioning

Option Recommended Value Notes

Volume Options

guarantee none No space reservation for volume at all

fractional_reserve 0 With Data ONTAP 733 fractional_reserve can be modified even for volumes without a space guarantee of type volume Prior to Data ONTAP 733 the value was fixed at 100

autosize on Turn autosize on

18 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first volume_grow

Volume Snapshot Options

reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)

schedule switched off

autodelete off Deleting Snapshot copies might be an option when the volume can no longer be resized because the maximum configured size has been reached or when the aggregatersquos free space becomes low

LUN Options

reservation disable No preallocation of blocks for LUN

SUMMARY OF PROVISIONING METHODS

There are good reasons for using any of the provisioning methods already described however full fat for SAN environments should be avoided wherever possible because of the storage efficiency ratio Even with a 100 block usage ratio on primary data zero fat provisioning has many advantages and is the preferred method

bull The aggregatersquos free space is a global pool that can serve space for volumes This gives more flexibility than volumes with their own dedicated free space

bull For SAN volumes the block consumption can be easily monitored bull Deduplication savings go directly into the global pool of free space which is the aggregate or the

resource pool in which it belongs bull Monitoring is needed only on the aggregate level Volumes will grow on demand

Table 7) Comparison of provisioning methods

Characteristics Full Fat Low Fat Zero Fat

Space consumption 2X + Δ X + Δ X ndash N + Δ2

Space efficient

No Partially for Snapshot copies

Yes

Monitoring Optional Required on volume and aggregate level

Required on aggregate level

Notificationmitigation process required

No Optional in most cases Yes

2 N is the traditional thin provisioning impact = amount of blocks logically allocated but not used

19 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Characteristics Full Fat Low Fat Zero Fat

Pool benefitting of dedupe savings

Volume fractional reserve area

Volume free space area Aggregate free space area

Risk of an out of space condition on primary data

No No as long as autodelete is able to delete any Snapshot copies

Yes when monitoring and notification processes are missing

Typical use cases Small installations None or few storage management skills (no monitoring infrastructure)

Large database environments

Shared storage infrastructure Testdev environments Storage pools for virtualized servers

FULLLOWZERO FAT PROVISIONING WITH PROVISIONING MANAGER

NetApp Provisioning Manager focuses on fast and efficient provisioning of storage resources in the NetApp storage infrastructure It speeds up provisioning workflows and boosts capacity usage by using policy-based automation for the entire NetApp NAS and SAN infrastructure These processes are

bull Faster than manually provisioning storage bull Easier to maintain than scripts bull Instrumental in minimizing the risk of data loss resulting from misconfigured storage

Provisioning Manager applies user-defined policies to consistently select the appropriate resources for each provisioning activity This frees administrators from the headache of searching for available space to provision and allows more time for strategic issues The use of a centralized management console allows administrators to monitor the status of their provisioned storage resources Provisioning Manager can help improve business agility and capacity use shrink provisioning time and improve administrator productivity Provisioning Managerrsquos thin provisioning and deduplication capabilities provide a high level of storage efficiency from your NetApp storage investment

A GUI allows you to implement the provisioning models fulllowzero fat within Provisioning Manager See Figure 9 and Figure 10 for configuring storage efficiency in a provisioning policy for NAS and SAN Whenever storage is provisioned using this provisioning policy the settings apply automatically For more information refer to TR-3710 Operations Manager Provisioning Manager and Protection Manager Best Practices Guide

Provisioning Manager encapsulates technical details when provisioning storage and supports an easy integration with existing management tools and orchestration frameworks Policies and their use in so-called datasets and storage services allow you to exploit NetApp storage efficiency technologies without exposing a high level of technical detail to a higher level of management software

Note Provisioning Manager up to version 40 does not allow you to specify autosize and autodelete individually the policy template determines if these features are used and which options are selected In order to implement the provisioning methods outlined a customized ldquoProvisioning Scriptrdquo needs to be provided to set autosize and autodelete parameters according to the recommendations for FullLowZero methods Post provisioning scripts are standard with Provision Manager Use caution when Provisioning Manager runs conformance checks this reverts individual settings

20 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete

21 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat

FULLLOWZERO FAT PROVISIONING FOR STORAGE SERVICES

Storage services are an easy abstraction to provision storage in a utilitylike fashion A storage service describes all characteristic attributes for storage needed in a certain scenario A storage service catalog lists the available templates and allows you to provision storage with these attributes on demand Technically storage services or datasets consist of one or more resource pools a protection policy and a provisioning policy Fulllowzero fat provisioning for storage services is configured in the configuration wizard of the provisioning policy Figure 11 shows the provisioning policies closest to fulllowzero fat configurations

Figure 11) Fulllowzero fat provisioning policies for datasets and storage services

Because this wizard is able to configure the deduplication feature two policies are configured for the zero fat configurations one with deduplication and one without deduplication

22 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

HOW SHOULD A VOLUME BE SIZED

Because physical allocation of data within a zero fat-provisioned volume is done on demand theoretically the volume size can be set to a very high value that can easily keep all application data and Snapshot copies As the unallocated space in the volume is not exclusively reserved for this volume itself all other applications can benefit from the shared pool of unallocated storage However NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand The important advantage is that the commitment rate acts as a metric for data consolidation

Note The commitment rate reflects the amount of logical data consolidation This metric is suitable for deciding when data should be left for organic growth

Additionally the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers

APPLICATION RECOMMENDATIONS

Thin provisioning is most effective when applications use data that is committed to them step by step When applications preformat data the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks Because thin provisioning has no performance penalty the general recommendation is to provision with the zero fat configuration

For SAN-attached storage NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed On Windowsreg platforms this can be configured in NetApp SnapDrivereg

For Oraclereg database best practices refer to WP-7084 Storage Efficiency in an Oracle Environment

32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS

This section deals with provisioning storage for similar applications from a golden template A valid use case is a hosting provider who offers and serves predefined application services in mass quantities Instead of provisioning each application from scratch the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure

When applications are provisioned this way NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space This achieves a high degree of data consolidation and cost savings

The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance CPU and memory

There are two ways to align application data to a NetApp shared storage infrastructure

bull Volume-centric storage layout bull Dedupe-centric storage layout

Depending on the data lifetime suitability for deduplication consistency and tool constraints one way of aligning application data is more appropriate than the other

In both variants the storage of the application template can be provisioned as either full low or zero fat The cloning procedure inherits the attributes of the parent volume To create space-efficient clones the space guarantee must be set to none

23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

CONSEQUENCES FOR MONITORING

When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well

VOLUME-CENTRIC STORAGE LAYOUT

In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies

In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions

bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time

bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data

A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN

Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate

Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases

bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts

24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance 1

Deduplication Block Sharing FlexVol

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance n

FlexClone Block Sharing

Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow

Best Practice

A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance

Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided

bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning

bull Preformatting data

25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

DEDUPE-CENTRIC STORAGE LAYOUT

In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication

This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity

This storage layout provides the following storage efficiency advantages in a short- and long-term perspective

bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns

bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data

In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation

TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment

Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates

26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically

Template

Instance 1

Instance 2

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled

Best Practice

This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication

Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume

We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties

27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION

Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach

The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left

It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to

bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a

limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the

migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval

Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate

Settled

Aggregate

Nomad Nomad

To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor

28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

SLA-BASED ASSESSMENT FOR SETTLEDNOMAD

The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances

We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment

Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances

Figure 15) Alignment by technical impact (sorted by negative impact in descending order)

Settled Nomad

Eg All FC

Instance Inst1 InstN

Medium LowInside SLANeg Impact HighOutside SLA

Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest

Figure 16) Alignment by business impact (sorted by negative impact in descending order)

Settled Nomad

Instance

Neg Impact $$ $

Semi-Settled Nomad

PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION

Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers

29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion

NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations

Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility

ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE

While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted

SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING

In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting

ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS

Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore

In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller

30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise

We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives

Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases

bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously

provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on

These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future

SITUATIONS PUTTING SLA FULFILLMENT AT RISK

Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy

The following list summarize situations that are critical for service delivery

bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time

bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use

bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react

bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this

looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a

ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk

Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are

contained

31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups

41 PHASES AND TRANSITIONS

This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness

bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase

bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase

bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase

42 MONITORING

NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors

Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set

THRESHOLDS

Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation

Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page

32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 17) Operations Manager screen to configure thresholds on operational metrics

For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation

Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data

The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations

bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge

bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification

bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor

bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification

Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state

bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached

bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification

bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality

33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

TRENDING

Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly

Figure 18) Trending of data growth and days-to-full prediction in Operations Manager

Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting

The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates

On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary

Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object

34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME

For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog

MONITORING STORAGE EFFICIENCY RETURNS

NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard

Figure 19) Storage efficiency dashboard in Operations Manager

35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

43 NOTIFICATION

Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low

After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process

Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure

Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP

NOTIFY BY E-MAIL

An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42

NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager

NOTIFY BY SNMP

Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested

36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 20) Configuring an alarm based on the threshold aggregate almost full

Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there

NOTIFY BY SCRIPT

Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line

dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full

37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

44 MITIGATE STORAGE USE

Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor

Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary

When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness

MITIGATION ACTIVITIES FOR AGGREGATES

Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow

1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives

2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero

3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space

4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage

controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability

6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes

7 If none of the listed activities can be used the application must be stopped to achieve a consistent state

The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online

38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Table 8) Mitigation alternatives to control use within aggregates

No Mitigation Activity Repeatability SLA Impact Preparation Time

Time to Show Effect

1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits

None HW procurement

Immediate (+rebalancing)

Data ONTAP 8 High limits

2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate

3 Shrink other volumes in the aggregate if they have enough free space

One time Low None Immediate

4 Run deduplication and shrink volumes Repeatable Low

Time to execute dedupe

Immediate

5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time

6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh

Next planned downtime window

Minutes Volume switch-over time

7 Prevent application data loss and stop the application then migrate (offline)

Repeatable Lowndashhigh Coordinate with app owner

Minutes Migration time

MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS

Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity

Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)

Table 9) Mitigation activities for resource tightness within volumes

No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect

1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate

2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate

3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate

4 Activate FAS deduplication for the volume (requires proper space guarantees) One time

Lowpossible performance impact

Wait for schedule Hours

5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate

Repeatable High Next planned downtime window

Minutes Volume migration time

6 Stop application and migrate data Repeatable High Coordinate wapp owner

Minutes Migration time

39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations

The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum

51 SAMPLE SETTING 1 REAL-LIFE SETTING

This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative

Figure 21) Storage to enable organic data growth between planned downtime windows

Data Data Growth

Planned Downtime Window

Planned Downtime Window

Months Time

Note Several months might fall between planned downtime windows to perform major mitigation alternatives

The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely

Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate

40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth

An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are

bull First metric Aggregate capacity used bull Second metric Aggregate space committed

Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space

Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space

Data Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used

Aggregate Space Committed

0ndash50 gt 65

0ndash110 gt 120

Provisioning New Storage Y

Capacity Assessment Adapt Thresholds

Mitigate

Y Y

Y

Provisioning New Storage Y

Assess Capacity Y

41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

52 SAMPLE SETTING 2 SETTLEDNOMAD

This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online

Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months

Settled

Detecting the Need to Act

Effect of Mitigation (eg migration)

Hours Time

N NN N N

In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on

the individual situation and is calculated against 100

The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases

Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative

Detection Threshold Notify Mitigation

gt 70 Storage operations Stop provisioning of storage

gt 85 Storage operations Stop extending provisioned storage

gt 90 Storage operations Relax resource situation and migrate nomad

42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used

Settled Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used 0ndash70 70ndash85 gt 90

Provisioning New Storage Y

Extending Already Provisioned Storage

Relax UtilizationmdashNetApp Data Motion a Nomad

Y Y

Y

N N N

You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors

43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe

Elapsed Time

Capacity

1 Month 3 Months

Committed Capacity

Overall Trend

Last 3-Month Trend

Capacity Used

1 2 3

As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows

1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window

2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment

3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)

44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not

exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect

b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives

c Determine the growth rate Operations Manager provides help in determining the trend of data growth

d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past

To provision storage following these steps

1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes

2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler

entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there

is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller

c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely

d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off

Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 12: Lun Provision

12 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

3 PROVISIONING Some features such as data deduplication can be turned on or off at any time However to achieve the maximum consolidation and storage efficiency more strategies must be considered during the data-provisioning phase

In addition provisioning should take the flexibility of storage (for example the migrate ability) into account This allows administrators to easily move data off aggregates approaching capacity without downtime This is also an important aspect when planning to deliver services 24x7

Thus design goals for storage layouts are storage efficiency and operational flexibility In the following sections we discuss three orthogonal dimensions of provisioning storage Two of them focus on achieving data consolidation The third one focuses on achieving operational flexibility All dimensions can be combined independently Note that the achievable level of consolidation depends on the applications and its data

TR-3827 If You Are Doing This Then Your Storage Could Be Underutilized provides further understanding of storage efficiency and operational flexibility

31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING

This section deals with the way data is provisioned and the consequences for storage efficiency We recommend applying the so-called zero fat configurations

We consider the storage setup for a single application instance The presented configurations can be applied while provisioning storage from scratch and to already provisioned storage When the technical dimensions of storage provisioning are categorized in primary data and its Snapshottrade copies space there are four theoretical combinations for both network-attached storage (NAS) and storage area network (SAN) environments In practical applications only two variants are relevant to NAS and three variants are relevant to SAN storage

bull Full fat bull Low fat bull Zero fat

According to NetApp best practices we do not mix block and file data within a single volume which allows us to consider NAS and SAN environments separately

NAS

For NAS two options are recommended full fat and zero fat

bull Full fat The primary data and Snapshot copy space are preallocated bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves

the best ratio of storage efficiency when provisioning applications from scratch

13 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 7) Provisioning model for NAS storage from scratch Technically only two out of four combinations are possible

Primary Data (Files amp Directory) Space Allocation

Fat Thin

Full Fat Option No Option

No Option Zero Fat Option

Fat

Thin

Snapshot Copy Space

Allocation

Note Full fat is characterized slightly different in NAS and SAN due to their technical properties

FULL FAT PROVISIONING

Full fat provisioning NAS is the traditional (default) way to implement NFSCIFS shares Volumes in a full fat configuration are characterized as follows

bull Volumes are created with space guarantee bull The size of the volume follows the formula X + Δ

X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data

bull Because space used for Snapshot copies might grow unexpectedly the autosize function can be used to make space available when reaching a certain volume threshold This would also happen when the space reserved for user data gets low

bull Space reservation for Snapshot copies is used to hide the capacity used for Snapshot copies from the consumers (NAS clients)

bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Normally using autodelete is not recommended in NAS environments Keeping a certain number of

Snapshot copies for file versioningrestores might be part of the SLAs defined for file services

Note Deleting snapshots may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision

Table 2) Full fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default

autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit

14 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

Volume Snapshot Options

reserve yes Value depends on number of Snapshot copies and change rate within the volume

schedule switched on Automatic Snapshot technology schedules

autodelete off Deleting Snapshot copies is not recommended in most NAS environments

ZERO FAT PROVISIONING

The zero fat method is the most efficient way to provision NAS volumes

bull Volumes are created without space guarantee bull The size of the volume still follows the formula X + Δ

X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data Sizing the volume defines a container with a virtual size for the consumers NAS users are familiar with fixed-sized file shares

bull Space used for Snapshot copies can grow unexpectedly You can use the autosize function to make space available when reaching a certain volume threshold You can also use the autosize function when the space reserved for user data gets low

bull Space reserved for Snapshot copies is used to hide from the consumers (NAS clients) the capacity taken up by Snapshot copies

bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Using autodelete is normally not recommended in NAS environments Keeping a certain amount of

space for Snapshot copies for file versioningrestores is part of the SLAs defined for file services

Note Deleting Snapshot copies may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision

Table 3) Zero fat provisioning

Option Recommended Value Notes

Volume Options

guarantee none

fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default

autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit

15 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first - Autodelete is not recommended in most environments

Volume Snapshot Options

reserve yesno The value depends on the number of Snapshot copies and the change rate within the volume Displaying only the committed usable space using SLA is the preferred way to provision NAS storage However there might be situations in which the Snapshot reserve area is omitted (no)

schedule switched on Automatic Snapshot technology schedules

autodelete off Deleting Snapshot copies is not recommended in most NAS environments

SAN

For SAN we consider three options

bull Full fat Both primary data and its Snapshot copy space are preallocated bull Low fat The primary data is preallocated The Snapshot copy space is allocated on demand bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves

the best ratio of storage efficiency when provisioning applications from scratch

Figure 8) Provisioning model for SAN storage from scratch

Primary Data (LUN) Space Allocation

Fat Thin

Full Fat Option No Option

Low Fat Option Zero Fat Option

Fat

Thin

Snapshot Copy Space

Allocation

FULL FAT PROVISIONING

This method can be treated as the historical way of provisioning block storage with Data ONTAP

bull Volumes are created with space guarantee bull A fractional (overwrite) reserve is used to guarantee that the primary data can be overwritten

completely with Snapshot copies in place If this space is not available Snapshot copy creation will fail

bull The size of the volume follows the formula 2X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data

16 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The enhancements to the volume autosize capabilities (such as volume size-dependent thresholds) and the robustness of the Snapshot feature autodelete implementation have made full fat provisioning more or less obsolete As of today the default settings for creating volumesLUNs in Data ONTAP still apply to these settings See Provisioning from Scratch Full Fat to Zero Fat Provisioning for a discussion of using tools such as Provisioning Manager

Table 4) Full fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 100 Even technically possible a fractional reserve below 100 incorporates a potential risk to run out of Snapshot copy overwrite space This situation should be avoided

autosize off Autosize could be used as an option to create free space needed for Snapshot copy creation

Volume Snapshot Options

reserve 0

schedule switched off

autodelete off

LUN Options

reservation enable

LOW FAT PROVISIONING

With low fat provisioning we use a more space-efficient way to provision volumes

bull Volumes are created with space guarantee bull LUNs are created with space guarantee as well This setup does not benefit from unused blocks with

a LUN (During the lifetime of a LUN the amount of free unused blocks typically decreases Without space reclamation techniques allocated blocks on the storage system stay allocated)

bull The size of the volume follows the formula X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data

bull Because space used for Snapshot copies might grow unexpectedly the autosize and autodelete policies are used to make space available when reaching a preset volume threshold

Table 5) Low fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 0 Snapshot space is controlled by autodelete and autosize options

autosize on Turn autosize on

17 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first volume_grow Increasing the size of the volume does not destroy any data or information There is no reason not to increase the size of the volume It can be reverted afterward if the volume free space increases again There might be configurations where automatic volume growth is not desired

Volume Snapshot Options

reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration of Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)

schedule switched off

autodelete on There might be Snapshot copies that are needed to fulfill certain SLAs such as backup SLAs Setting this policy needs to be negotiated with the business requirements In the worst case scenario deleting Snapshot copies is not an option

autodelete options volume oldest_first There is a precedent for Snapshot copies being a candidate for deletion oldest_first is the current default

LUN Options

reservation enable Reserves space for the LUN during creation

ZERO FAT PROVISIONING

Full and low fat provisioning use fully allocated volumes and LUNs Per-default space allocation happens within the boundaries of the LUN and the volume Zero fat follows a 100 ldquoallocate on demandrdquo concept

bull Volumes are created without space guarantee bull LUNs are created without space guarantee bull The size of the volume follows the formula X ndash N + Δ

X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data N is the amount of unused blocks within a given LUN

Table 6) Zero fat provisioning

Option Recommended Value Notes

Volume Options

guarantee none No space reservation for volume at all

fractional_reserve 0 With Data ONTAP 733 fractional_reserve can be modified even for volumes without a space guarantee of type volume Prior to Data ONTAP 733 the value was fixed at 100

autosize on Turn autosize on

18 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first volume_grow

Volume Snapshot Options

reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)

schedule switched off

autodelete off Deleting Snapshot copies might be an option when the volume can no longer be resized because the maximum configured size has been reached or when the aggregatersquos free space becomes low

LUN Options

reservation disable No preallocation of blocks for LUN

SUMMARY OF PROVISIONING METHODS

There are good reasons for using any of the provisioning methods already described however full fat for SAN environments should be avoided wherever possible because of the storage efficiency ratio Even with a 100 block usage ratio on primary data zero fat provisioning has many advantages and is the preferred method

bull The aggregatersquos free space is a global pool that can serve space for volumes This gives more flexibility than volumes with their own dedicated free space

bull For SAN volumes the block consumption can be easily monitored bull Deduplication savings go directly into the global pool of free space which is the aggregate or the

resource pool in which it belongs bull Monitoring is needed only on the aggregate level Volumes will grow on demand

Table 7) Comparison of provisioning methods

Characteristics Full Fat Low Fat Zero Fat

Space consumption 2X + Δ X + Δ X ndash N + Δ2

Space efficient

No Partially for Snapshot copies

Yes

Monitoring Optional Required on volume and aggregate level

Required on aggregate level

Notificationmitigation process required

No Optional in most cases Yes

2 N is the traditional thin provisioning impact = amount of blocks logically allocated but not used

19 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Characteristics Full Fat Low Fat Zero Fat

Pool benefitting of dedupe savings

Volume fractional reserve area

Volume free space area Aggregate free space area

Risk of an out of space condition on primary data

No No as long as autodelete is able to delete any Snapshot copies

Yes when monitoring and notification processes are missing

Typical use cases Small installations None or few storage management skills (no monitoring infrastructure)

Large database environments

Shared storage infrastructure Testdev environments Storage pools for virtualized servers

FULLLOWZERO FAT PROVISIONING WITH PROVISIONING MANAGER

NetApp Provisioning Manager focuses on fast and efficient provisioning of storage resources in the NetApp storage infrastructure It speeds up provisioning workflows and boosts capacity usage by using policy-based automation for the entire NetApp NAS and SAN infrastructure These processes are

bull Faster than manually provisioning storage bull Easier to maintain than scripts bull Instrumental in minimizing the risk of data loss resulting from misconfigured storage

Provisioning Manager applies user-defined policies to consistently select the appropriate resources for each provisioning activity This frees administrators from the headache of searching for available space to provision and allows more time for strategic issues The use of a centralized management console allows administrators to monitor the status of their provisioned storage resources Provisioning Manager can help improve business agility and capacity use shrink provisioning time and improve administrator productivity Provisioning Managerrsquos thin provisioning and deduplication capabilities provide a high level of storage efficiency from your NetApp storage investment

A GUI allows you to implement the provisioning models fulllowzero fat within Provisioning Manager See Figure 9 and Figure 10 for configuring storage efficiency in a provisioning policy for NAS and SAN Whenever storage is provisioned using this provisioning policy the settings apply automatically For more information refer to TR-3710 Operations Manager Provisioning Manager and Protection Manager Best Practices Guide

Provisioning Manager encapsulates technical details when provisioning storage and supports an easy integration with existing management tools and orchestration frameworks Policies and their use in so-called datasets and storage services allow you to exploit NetApp storage efficiency technologies without exposing a high level of technical detail to a higher level of management software

Note Provisioning Manager up to version 40 does not allow you to specify autosize and autodelete individually the policy template determines if these features are used and which options are selected In order to implement the provisioning methods outlined a customized ldquoProvisioning Scriptrdquo needs to be provided to set autosize and autodelete parameters according to the recommendations for FullLowZero methods Post provisioning scripts are standard with Provision Manager Use caution when Provisioning Manager runs conformance checks this reverts individual settings

20 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete

21 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat

FULLLOWZERO FAT PROVISIONING FOR STORAGE SERVICES

Storage services are an easy abstraction to provision storage in a utilitylike fashion A storage service describes all characteristic attributes for storage needed in a certain scenario A storage service catalog lists the available templates and allows you to provision storage with these attributes on demand Technically storage services or datasets consist of one or more resource pools a protection policy and a provisioning policy Fulllowzero fat provisioning for storage services is configured in the configuration wizard of the provisioning policy Figure 11 shows the provisioning policies closest to fulllowzero fat configurations

Figure 11) Fulllowzero fat provisioning policies for datasets and storage services

Because this wizard is able to configure the deduplication feature two policies are configured for the zero fat configurations one with deduplication and one without deduplication

22 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

HOW SHOULD A VOLUME BE SIZED

Because physical allocation of data within a zero fat-provisioned volume is done on demand theoretically the volume size can be set to a very high value that can easily keep all application data and Snapshot copies As the unallocated space in the volume is not exclusively reserved for this volume itself all other applications can benefit from the shared pool of unallocated storage However NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand The important advantage is that the commitment rate acts as a metric for data consolidation

Note The commitment rate reflects the amount of logical data consolidation This metric is suitable for deciding when data should be left for organic growth

Additionally the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers

APPLICATION RECOMMENDATIONS

Thin provisioning is most effective when applications use data that is committed to them step by step When applications preformat data the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks Because thin provisioning has no performance penalty the general recommendation is to provision with the zero fat configuration

For SAN-attached storage NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed On Windowsreg platforms this can be configured in NetApp SnapDrivereg

For Oraclereg database best practices refer to WP-7084 Storage Efficiency in an Oracle Environment

32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS

This section deals with provisioning storage for similar applications from a golden template A valid use case is a hosting provider who offers and serves predefined application services in mass quantities Instead of provisioning each application from scratch the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure

When applications are provisioned this way NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space This achieves a high degree of data consolidation and cost savings

The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance CPU and memory

There are two ways to align application data to a NetApp shared storage infrastructure

bull Volume-centric storage layout bull Dedupe-centric storage layout

Depending on the data lifetime suitability for deduplication consistency and tool constraints one way of aligning application data is more appropriate than the other

In both variants the storage of the application template can be provisioned as either full low or zero fat The cloning procedure inherits the attributes of the parent volume To create space-efficient clones the space guarantee must be set to none

23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

CONSEQUENCES FOR MONITORING

When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well

VOLUME-CENTRIC STORAGE LAYOUT

In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies

In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions

bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time

bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data

A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN

Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate

Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases

bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts

24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance 1

Deduplication Block Sharing FlexVol

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance n

FlexClone Block Sharing

Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow

Best Practice

A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance

Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided

bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning

bull Preformatting data

25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

DEDUPE-CENTRIC STORAGE LAYOUT

In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication

This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity

This storage layout provides the following storage efficiency advantages in a short- and long-term perspective

bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns

bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data

In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation

TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment

Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates

26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically

Template

Instance 1

Instance 2

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled

Best Practice

This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication

Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume

We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties

27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION

Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach

The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left

It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to

bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a

limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the

migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval

Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate

Settled

Aggregate

Nomad Nomad

To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor

28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

SLA-BASED ASSESSMENT FOR SETTLEDNOMAD

The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances

We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment

Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances

Figure 15) Alignment by technical impact (sorted by negative impact in descending order)

Settled Nomad

Eg All FC

Instance Inst1 InstN

Medium LowInside SLANeg Impact HighOutside SLA

Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest

Figure 16) Alignment by business impact (sorted by negative impact in descending order)

Settled Nomad

Instance

Neg Impact $$ $

Semi-Settled Nomad

PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION

Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers

29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion

NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations

Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility

ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE

While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted

SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING

In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting

ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS

Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore

In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller

30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise

We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives

Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases

bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously

provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on

These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future

SITUATIONS PUTTING SLA FULFILLMENT AT RISK

Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy

The following list summarize situations that are critical for service delivery

bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time

bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use

bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react

bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this

looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a

ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk

Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are

contained

31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups

41 PHASES AND TRANSITIONS

This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness

bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase

bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase

bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase

42 MONITORING

NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors

Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set

THRESHOLDS

Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation

Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page

32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 17) Operations Manager screen to configure thresholds on operational metrics

For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation

Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data

The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations

bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge

bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification

bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor

bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification

Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state

bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached

bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification

bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality

33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

TRENDING

Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly

Figure 18) Trending of data growth and days-to-full prediction in Operations Manager

Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting

The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates

On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary

Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object

34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME

For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog

MONITORING STORAGE EFFICIENCY RETURNS

NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard

Figure 19) Storage efficiency dashboard in Operations Manager

35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

43 NOTIFICATION

Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low

After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process

Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure

Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP

NOTIFY BY E-MAIL

An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42

NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager

NOTIFY BY SNMP

Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested

36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 20) Configuring an alarm based on the threshold aggregate almost full

Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there

NOTIFY BY SCRIPT

Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line

dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full

37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

44 MITIGATE STORAGE USE

Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor

Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary

When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness

MITIGATION ACTIVITIES FOR AGGREGATES

Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow

1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives

2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero

3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space

4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage

controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability

6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes

7 If none of the listed activities can be used the application must be stopped to achieve a consistent state

The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online

38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Table 8) Mitigation alternatives to control use within aggregates

No Mitigation Activity Repeatability SLA Impact Preparation Time

Time to Show Effect

1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits

None HW procurement

Immediate (+rebalancing)

Data ONTAP 8 High limits

2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate

3 Shrink other volumes in the aggregate if they have enough free space

One time Low None Immediate

4 Run deduplication and shrink volumes Repeatable Low

Time to execute dedupe

Immediate

5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time

6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh

Next planned downtime window

Minutes Volume switch-over time

7 Prevent application data loss and stop the application then migrate (offline)

Repeatable Lowndashhigh Coordinate with app owner

Minutes Migration time

MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS

Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity

Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)

Table 9) Mitigation activities for resource tightness within volumes

No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect

1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate

2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate

3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate

4 Activate FAS deduplication for the volume (requires proper space guarantees) One time

Lowpossible performance impact

Wait for schedule Hours

5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate

Repeatable High Next planned downtime window

Minutes Volume migration time

6 Stop application and migrate data Repeatable High Coordinate wapp owner

Minutes Migration time

39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations

The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum

51 SAMPLE SETTING 1 REAL-LIFE SETTING

This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative

Figure 21) Storage to enable organic data growth between planned downtime windows

Data Data Growth

Planned Downtime Window

Planned Downtime Window

Months Time

Note Several months might fall between planned downtime windows to perform major mitigation alternatives

The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely

Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate

40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth

An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are

bull First metric Aggregate capacity used bull Second metric Aggregate space committed

Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space

Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space

Data Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used

Aggregate Space Committed

0ndash50 gt 65

0ndash110 gt 120

Provisioning New Storage Y

Capacity Assessment Adapt Thresholds

Mitigate

Y Y

Y

Provisioning New Storage Y

Assess Capacity Y

41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

52 SAMPLE SETTING 2 SETTLEDNOMAD

This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online

Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months

Settled

Detecting the Need to Act

Effect of Mitigation (eg migration)

Hours Time

N NN N N

In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on

the individual situation and is calculated against 100

The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases

Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative

Detection Threshold Notify Mitigation

gt 70 Storage operations Stop provisioning of storage

gt 85 Storage operations Stop extending provisioned storage

gt 90 Storage operations Relax resource situation and migrate nomad

42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used

Settled Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used 0ndash70 70ndash85 gt 90

Provisioning New Storage Y

Extending Already Provisioned Storage

Relax UtilizationmdashNetApp Data Motion a Nomad

Y Y

Y

N N N

You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors

43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe

Elapsed Time

Capacity

1 Month 3 Months

Committed Capacity

Overall Trend

Last 3-Month Trend

Capacity Used

1 2 3

As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows

1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window

2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment

3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)

44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not

exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect

b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives

c Determine the growth rate Operations Manager provides help in determining the trend of data growth

d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past

To provision storage following these steps

1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes

2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler

entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there

is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller

c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely

d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off

Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 13: Lun Provision

13 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 7) Provisioning model for NAS storage from scratch Technically only two out of four combinations are possible

Primary Data (Files amp Directory) Space Allocation

Fat Thin

Full Fat Option No Option

No Option Zero Fat Option

Fat

Thin

Snapshot Copy Space

Allocation

Note Full fat is characterized slightly different in NAS and SAN due to their technical properties

FULL FAT PROVISIONING

Full fat provisioning NAS is the traditional (default) way to implement NFSCIFS shares Volumes in a full fat configuration are characterized as follows

bull Volumes are created with space guarantee bull The size of the volume follows the formula X + Δ

X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data

bull Because space used for Snapshot copies might grow unexpectedly the autosize function can be used to make space available when reaching a certain volume threshold This would also happen when the space reserved for user data gets low

bull Space reservation for Snapshot copies is used to hide the capacity used for Snapshot copies from the consumers (NAS clients)

bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Normally using autodelete is not recommended in NAS environments Keeping a certain number of

Snapshot copies for file versioningrestores might be part of the SLAs defined for file services

Note Deleting snapshots may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision

Table 2) Full fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default

autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit

14 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

Volume Snapshot Options

reserve yes Value depends on number of Snapshot copies and change rate within the volume

schedule switched on Automatic Snapshot technology schedules

autodelete off Deleting Snapshot copies is not recommended in most NAS environments

ZERO FAT PROVISIONING

The zero fat method is the most efficient way to provision NAS volumes

bull Volumes are created without space guarantee bull The size of the volume still follows the formula X + Δ

X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data Sizing the volume defines a container with a virtual size for the consumers NAS users are familiar with fixed-sized file shares

bull Space used for Snapshot copies can grow unexpectedly You can use the autosize function to make space available when reaching a certain volume threshold You can also use the autosize function when the space reserved for user data gets low

bull Space reserved for Snapshot copies is used to hide from the consumers (NAS clients) the capacity taken up by Snapshot copies

bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Using autodelete is normally not recommended in NAS environments Keeping a certain amount of

space for Snapshot copies for file versioningrestores is part of the SLAs defined for file services

Note Deleting Snapshot copies may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision

Table 3) Zero fat provisioning

Option Recommended Value Notes

Volume Options

guarantee none

fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default

autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit

15 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first - Autodelete is not recommended in most environments

Volume Snapshot Options

reserve yesno The value depends on the number of Snapshot copies and the change rate within the volume Displaying only the committed usable space using SLA is the preferred way to provision NAS storage However there might be situations in which the Snapshot reserve area is omitted (no)

schedule switched on Automatic Snapshot technology schedules

autodelete off Deleting Snapshot copies is not recommended in most NAS environments

SAN

For SAN we consider three options

bull Full fat Both primary data and its Snapshot copy space are preallocated bull Low fat The primary data is preallocated The Snapshot copy space is allocated on demand bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves

the best ratio of storage efficiency when provisioning applications from scratch

Figure 8) Provisioning model for SAN storage from scratch

Primary Data (LUN) Space Allocation

Fat Thin

Full Fat Option No Option

Low Fat Option Zero Fat Option

Fat

Thin

Snapshot Copy Space

Allocation

FULL FAT PROVISIONING

This method can be treated as the historical way of provisioning block storage with Data ONTAP

bull Volumes are created with space guarantee bull A fractional (overwrite) reserve is used to guarantee that the primary data can be overwritten

completely with Snapshot copies in place If this space is not available Snapshot copy creation will fail

bull The size of the volume follows the formula 2X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data

16 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The enhancements to the volume autosize capabilities (such as volume size-dependent thresholds) and the robustness of the Snapshot feature autodelete implementation have made full fat provisioning more or less obsolete As of today the default settings for creating volumesLUNs in Data ONTAP still apply to these settings See Provisioning from Scratch Full Fat to Zero Fat Provisioning for a discussion of using tools such as Provisioning Manager

Table 4) Full fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 100 Even technically possible a fractional reserve below 100 incorporates a potential risk to run out of Snapshot copy overwrite space This situation should be avoided

autosize off Autosize could be used as an option to create free space needed for Snapshot copy creation

Volume Snapshot Options

reserve 0

schedule switched off

autodelete off

LUN Options

reservation enable

LOW FAT PROVISIONING

With low fat provisioning we use a more space-efficient way to provision volumes

bull Volumes are created with space guarantee bull LUNs are created with space guarantee as well This setup does not benefit from unused blocks with

a LUN (During the lifetime of a LUN the amount of free unused blocks typically decreases Without space reclamation techniques allocated blocks on the storage system stay allocated)

bull The size of the volume follows the formula X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data

bull Because space used for Snapshot copies might grow unexpectedly the autosize and autodelete policies are used to make space available when reaching a preset volume threshold

Table 5) Low fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 0 Snapshot space is controlled by autodelete and autosize options

autosize on Turn autosize on

17 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first volume_grow Increasing the size of the volume does not destroy any data or information There is no reason not to increase the size of the volume It can be reverted afterward if the volume free space increases again There might be configurations where automatic volume growth is not desired

Volume Snapshot Options

reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration of Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)

schedule switched off

autodelete on There might be Snapshot copies that are needed to fulfill certain SLAs such as backup SLAs Setting this policy needs to be negotiated with the business requirements In the worst case scenario deleting Snapshot copies is not an option

autodelete options volume oldest_first There is a precedent for Snapshot copies being a candidate for deletion oldest_first is the current default

LUN Options

reservation enable Reserves space for the LUN during creation

ZERO FAT PROVISIONING

Full and low fat provisioning use fully allocated volumes and LUNs Per-default space allocation happens within the boundaries of the LUN and the volume Zero fat follows a 100 ldquoallocate on demandrdquo concept

bull Volumes are created without space guarantee bull LUNs are created without space guarantee bull The size of the volume follows the formula X ndash N + Δ

X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data N is the amount of unused blocks within a given LUN

Table 6) Zero fat provisioning

Option Recommended Value Notes

Volume Options

guarantee none No space reservation for volume at all

fractional_reserve 0 With Data ONTAP 733 fractional_reserve can be modified even for volumes without a space guarantee of type volume Prior to Data ONTAP 733 the value was fixed at 100

autosize on Turn autosize on

18 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first volume_grow

Volume Snapshot Options

reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)

schedule switched off

autodelete off Deleting Snapshot copies might be an option when the volume can no longer be resized because the maximum configured size has been reached or when the aggregatersquos free space becomes low

LUN Options

reservation disable No preallocation of blocks for LUN

SUMMARY OF PROVISIONING METHODS

There are good reasons for using any of the provisioning methods already described however full fat for SAN environments should be avoided wherever possible because of the storage efficiency ratio Even with a 100 block usage ratio on primary data zero fat provisioning has many advantages and is the preferred method

bull The aggregatersquos free space is a global pool that can serve space for volumes This gives more flexibility than volumes with their own dedicated free space

bull For SAN volumes the block consumption can be easily monitored bull Deduplication savings go directly into the global pool of free space which is the aggregate or the

resource pool in which it belongs bull Monitoring is needed only on the aggregate level Volumes will grow on demand

Table 7) Comparison of provisioning methods

Characteristics Full Fat Low Fat Zero Fat

Space consumption 2X + Δ X + Δ X ndash N + Δ2

Space efficient

No Partially for Snapshot copies

Yes

Monitoring Optional Required on volume and aggregate level

Required on aggregate level

Notificationmitigation process required

No Optional in most cases Yes

2 N is the traditional thin provisioning impact = amount of blocks logically allocated but not used

19 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Characteristics Full Fat Low Fat Zero Fat

Pool benefitting of dedupe savings

Volume fractional reserve area

Volume free space area Aggregate free space area

Risk of an out of space condition on primary data

No No as long as autodelete is able to delete any Snapshot copies

Yes when monitoring and notification processes are missing

Typical use cases Small installations None or few storage management skills (no monitoring infrastructure)

Large database environments

Shared storage infrastructure Testdev environments Storage pools for virtualized servers

FULLLOWZERO FAT PROVISIONING WITH PROVISIONING MANAGER

NetApp Provisioning Manager focuses on fast and efficient provisioning of storage resources in the NetApp storage infrastructure It speeds up provisioning workflows and boosts capacity usage by using policy-based automation for the entire NetApp NAS and SAN infrastructure These processes are

bull Faster than manually provisioning storage bull Easier to maintain than scripts bull Instrumental in minimizing the risk of data loss resulting from misconfigured storage

Provisioning Manager applies user-defined policies to consistently select the appropriate resources for each provisioning activity This frees administrators from the headache of searching for available space to provision and allows more time for strategic issues The use of a centralized management console allows administrators to monitor the status of their provisioned storage resources Provisioning Manager can help improve business agility and capacity use shrink provisioning time and improve administrator productivity Provisioning Managerrsquos thin provisioning and deduplication capabilities provide a high level of storage efficiency from your NetApp storage investment

A GUI allows you to implement the provisioning models fulllowzero fat within Provisioning Manager See Figure 9 and Figure 10 for configuring storage efficiency in a provisioning policy for NAS and SAN Whenever storage is provisioned using this provisioning policy the settings apply automatically For more information refer to TR-3710 Operations Manager Provisioning Manager and Protection Manager Best Practices Guide

Provisioning Manager encapsulates technical details when provisioning storage and supports an easy integration with existing management tools and orchestration frameworks Policies and their use in so-called datasets and storage services allow you to exploit NetApp storage efficiency technologies without exposing a high level of technical detail to a higher level of management software

Note Provisioning Manager up to version 40 does not allow you to specify autosize and autodelete individually the policy template determines if these features are used and which options are selected In order to implement the provisioning methods outlined a customized ldquoProvisioning Scriptrdquo needs to be provided to set autosize and autodelete parameters according to the recommendations for FullLowZero methods Post provisioning scripts are standard with Provision Manager Use caution when Provisioning Manager runs conformance checks this reverts individual settings

20 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete

21 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat

FULLLOWZERO FAT PROVISIONING FOR STORAGE SERVICES

Storage services are an easy abstraction to provision storage in a utilitylike fashion A storage service describes all characteristic attributes for storage needed in a certain scenario A storage service catalog lists the available templates and allows you to provision storage with these attributes on demand Technically storage services or datasets consist of one or more resource pools a protection policy and a provisioning policy Fulllowzero fat provisioning for storage services is configured in the configuration wizard of the provisioning policy Figure 11 shows the provisioning policies closest to fulllowzero fat configurations

Figure 11) Fulllowzero fat provisioning policies for datasets and storage services

Because this wizard is able to configure the deduplication feature two policies are configured for the zero fat configurations one with deduplication and one without deduplication

22 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

HOW SHOULD A VOLUME BE SIZED

Because physical allocation of data within a zero fat-provisioned volume is done on demand theoretically the volume size can be set to a very high value that can easily keep all application data and Snapshot copies As the unallocated space in the volume is not exclusively reserved for this volume itself all other applications can benefit from the shared pool of unallocated storage However NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand The important advantage is that the commitment rate acts as a metric for data consolidation

Note The commitment rate reflects the amount of logical data consolidation This metric is suitable for deciding when data should be left for organic growth

Additionally the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers

APPLICATION RECOMMENDATIONS

Thin provisioning is most effective when applications use data that is committed to them step by step When applications preformat data the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks Because thin provisioning has no performance penalty the general recommendation is to provision with the zero fat configuration

For SAN-attached storage NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed On Windowsreg platforms this can be configured in NetApp SnapDrivereg

For Oraclereg database best practices refer to WP-7084 Storage Efficiency in an Oracle Environment

32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS

This section deals with provisioning storage for similar applications from a golden template A valid use case is a hosting provider who offers and serves predefined application services in mass quantities Instead of provisioning each application from scratch the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure

When applications are provisioned this way NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space This achieves a high degree of data consolidation and cost savings

The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance CPU and memory

There are two ways to align application data to a NetApp shared storage infrastructure

bull Volume-centric storage layout bull Dedupe-centric storage layout

Depending on the data lifetime suitability for deduplication consistency and tool constraints one way of aligning application data is more appropriate than the other

In both variants the storage of the application template can be provisioned as either full low or zero fat The cloning procedure inherits the attributes of the parent volume To create space-efficient clones the space guarantee must be set to none

23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

CONSEQUENCES FOR MONITORING

When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well

VOLUME-CENTRIC STORAGE LAYOUT

In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies

In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions

bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time

bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data

A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN

Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate

Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases

bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts

24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance 1

Deduplication Block Sharing FlexVol

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance n

FlexClone Block Sharing

Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow

Best Practice

A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance

Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided

bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning

bull Preformatting data

25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

DEDUPE-CENTRIC STORAGE LAYOUT

In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication

This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity

This storage layout provides the following storage efficiency advantages in a short- and long-term perspective

bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns

bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data

In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation

TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment

Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates

26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically

Template

Instance 1

Instance 2

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled

Best Practice

This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication

Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume

We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties

27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION

Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach

The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left

It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to

bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a

limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the

migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval

Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate

Settled

Aggregate

Nomad Nomad

To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor

28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

SLA-BASED ASSESSMENT FOR SETTLEDNOMAD

The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances

We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment

Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances

Figure 15) Alignment by technical impact (sorted by negative impact in descending order)

Settled Nomad

Eg All FC

Instance Inst1 InstN

Medium LowInside SLANeg Impact HighOutside SLA

Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest

Figure 16) Alignment by business impact (sorted by negative impact in descending order)

Settled Nomad

Instance

Neg Impact $$ $

Semi-Settled Nomad

PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION

Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers

29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion

NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations

Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility

ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE

While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted

SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING

In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting

ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS

Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore

In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller

30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise

We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives

Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases

bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously

provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on

These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future

SITUATIONS PUTTING SLA FULFILLMENT AT RISK

Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy

The following list summarize situations that are critical for service delivery

bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time

bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use

bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react

bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this

looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a

ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk

Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are

contained

31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups

41 PHASES AND TRANSITIONS

This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness

bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase

bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase

bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase

42 MONITORING

NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors

Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set

THRESHOLDS

Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation

Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page

32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 17) Operations Manager screen to configure thresholds on operational metrics

For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation

Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data

The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations

bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge

bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification

bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor

bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification

Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state

bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached

bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification

bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality

33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

TRENDING

Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly

Figure 18) Trending of data growth and days-to-full prediction in Operations Manager

Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting

The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates

On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary

Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object

34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME

For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog

MONITORING STORAGE EFFICIENCY RETURNS

NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard

Figure 19) Storage efficiency dashboard in Operations Manager

35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

43 NOTIFICATION

Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low

After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process

Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure

Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP

NOTIFY BY E-MAIL

An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42

NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager

NOTIFY BY SNMP

Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested

36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 20) Configuring an alarm based on the threshold aggregate almost full

Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there

NOTIFY BY SCRIPT

Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line

dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full

37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

44 MITIGATE STORAGE USE

Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor

Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary

When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness

MITIGATION ACTIVITIES FOR AGGREGATES

Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow

1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives

2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero

3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space

4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage

controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability

6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes

7 If none of the listed activities can be used the application must be stopped to achieve a consistent state

The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online

38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Table 8) Mitigation alternatives to control use within aggregates

No Mitigation Activity Repeatability SLA Impact Preparation Time

Time to Show Effect

1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits

None HW procurement

Immediate (+rebalancing)

Data ONTAP 8 High limits

2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate

3 Shrink other volumes in the aggregate if they have enough free space

One time Low None Immediate

4 Run deduplication and shrink volumes Repeatable Low

Time to execute dedupe

Immediate

5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time

6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh

Next planned downtime window

Minutes Volume switch-over time

7 Prevent application data loss and stop the application then migrate (offline)

Repeatable Lowndashhigh Coordinate with app owner

Minutes Migration time

MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS

Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity

Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)

Table 9) Mitigation activities for resource tightness within volumes

No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect

1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate

2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate

3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate

4 Activate FAS deduplication for the volume (requires proper space guarantees) One time

Lowpossible performance impact

Wait for schedule Hours

5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate

Repeatable High Next planned downtime window

Minutes Volume migration time

6 Stop application and migrate data Repeatable High Coordinate wapp owner

Minutes Migration time

39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations

The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum

51 SAMPLE SETTING 1 REAL-LIFE SETTING

This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative

Figure 21) Storage to enable organic data growth between planned downtime windows

Data Data Growth

Planned Downtime Window

Planned Downtime Window

Months Time

Note Several months might fall between planned downtime windows to perform major mitigation alternatives

The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely

Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate

40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth

An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are

bull First metric Aggregate capacity used bull Second metric Aggregate space committed

Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space

Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space

Data Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used

Aggregate Space Committed

0ndash50 gt 65

0ndash110 gt 120

Provisioning New Storage Y

Capacity Assessment Adapt Thresholds

Mitigate

Y Y

Y

Provisioning New Storage Y

Assess Capacity Y

41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

52 SAMPLE SETTING 2 SETTLEDNOMAD

This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online

Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months

Settled

Detecting the Need to Act

Effect of Mitigation (eg migration)

Hours Time

N NN N N

In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on

the individual situation and is calculated against 100

The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases

Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative

Detection Threshold Notify Mitigation

gt 70 Storage operations Stop provisioning of storage

gt 85 Storage operations Stop extending provisioned storage

gt 90 Storage operations Relax resource situation and migrate nomad

42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used

Settled Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used 0ndash70 70ndash85 gt 90

Provisioning New Storage Y

Extending Already Provisioned Storage

Relax UtilizationmdashNetApp Data Motion a Nomad

Y Y

Y

N N N

You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors

43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe

Elapsed Time

Capacity

1 Month 3 Months

Committed Capacity

Overall Trend

Last 3-Month Trend

Capacity Used

1 2 3

As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows

1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window

2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment

3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)

44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not

exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect

b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives

c Determine the growth rate Operations Manager provides help in determining the trend of data growth

d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past

To provision storage following these steps

1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes

2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler

entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there

is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller

c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely

d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off

Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 14: Lun Provision

14 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

Volume Snapshot Options

reserve yes Value depends on number of Snapshot copies and change rate within the volume

schedule switched on Automatic Snapshot technology schedules

autodelete off Deleting Snapshot copies is not recommended in most NAS environments

ZERO FAT PROVISIONING

The zero fat method is the most efficient way to provision NAS volumes

bull Volumes are created without space guarantee bull The size of the volume still follows the formula X + Δ

X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data Sizing the volume defines a container with a virtual size for the consumers NAS users are familiar with fixed-sized file shares

bull Space used for Snapshot copies can grow unexpectedly You can use the autosize function to make space available when reaching a certain volume threshold You can also use the autosize function when the space reserved for user data gets low

bull Space reserved for Snapshot copies is used to hide from the consumers (NAS clients) the capacity taken up by Snapshot copies

bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Using autodelete is normally not recommended in NAS environments Keeping a certain amount of

space for Snapshot copies for file versioningrestores is part of the SLAs defined for file services

Note Deleting Snapshot copies may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision

Table 3) Zero fat provisioning

Option Recommended Value Notes

Volume Options

guarantee none

fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default

autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit

15 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first - Autodelete is not recommended in most environments

Volume Snapshot Options

reserve yesno The value depends on the number of Snapshot copies and the change rate within the volume Displaying only the committed usable space using SLA is the preferred way to provision NAS storage However there might be situations in which the Snapshot reserve area is omitted (no)

schedule switched on Automatic Snapshot technology schedules

autodelete off Deleting Snapshot copies is not recommended in most NAS environments

SAN

For SAN we consider three options

bull Full fat Both primary data and its Snapshot copy space are preallocated bull Low fat The primary data is preallocated The Snapshot copy space is allocated on demand bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves

the best ratio of storage efficiency when provisioning applications from scratch

Figure 8) Provisioning model for SAN storage from scratch

Primary Data (LUN) Space Allocation

Fat Thin

Full Fat Option No Option

Low Fat Option Zero Fat Option

Fat

Thin

Snapshot Copy Space

Allocation

FULL FAT PROVISIONING

This method can be treated as the historical way of provisioning block storage with Data ONTAP

bull Volumes are created with space guarantee bull A fractional (overwrite) reserve is used to guarantee that the primary data can be overwritten

completely with Snapshot copies in place If this space is not available Snapshot copy creation will fail

bull The size of the volume follows the formula 2X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data

16 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The enhancements to the volume autosize capabilities (such as volume size-dependent thresholds) and the robustness of the Snapshot feature autodelete implementation have made full fat provisioning more or less obsolete As of today the default settings for creating volumesLUNs in Data ONTAP still apply to these settings See Provisioning from Scratch Full Fat to Zero Fat Provisioning for a discussion of using tools such as Provisioning Manager

Table 4) Full fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 100 Even technically possible a fractional reserve below 100 incorporates a potential risk to run out of Snapshot copy overwrite space This situation should be avoided

autosize off Autosize could be used as an option to create free space needed for Snapshot copy creation

Volume Snapshot Options

reserve 0

schedule switched off

autodelete off

LUN Options

reservation enable

LOW FAT PROVISIONING

With low fat provisioning we use a more space-efficient way to provision volumes

bull Volumes are created with space guarantee bull LUNs are created with space guarantee as well This setup does not benefit from unused blocks with

a LUN (During the lifetime of a LUN the amount of free unused blocks typically decreases Without space reclamation techniques allocated blocks on the storage system stay allocated)

bull The size of the volume follows the formula X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data

bull Because space used for Snapshot copies might grow unexpectedly the autosize and autodelete policies are used to make space available when reaching a preset volume threshold

Table 5) Low fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 0 Snapshot space is controlled by autodelete and autosize options

autosize on Turn autosize on

17 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first volume_grow Increasing the size of the volume does not destroy any data or information There is no reason not to increase the size of the volume It can be reverted afterward if the volume free space increases again There might be configurations where automatic volume growth is not desired

Volume Snapshot Options

reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration of Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)

schedule switched off

autodelete on There might be Snapshot copies that are needed to fulfill certain SLAs such as backup SLAs Setting this policy needs to be negotiated with the business requirements In the worst case scenario deleting Snapshot copies is not an option

autodelete options volume oldest_first There is a precedent for Snapshot copies being a candidate for deletion oldest_first is the current default

LUN Options

reservation enable Reserves space for the LUN during creation

ZERO FAT PROVISIONING

Full and low fat provisioning use fully allocated volumes and LUNs Per-default space allocation happens within the boundaries of the LUN and the volume Zero fat follows a 100 ldquoallocate on demandrdquo concept

bull Volumes are created without space guarantee bull LUNs are created without space guarantee bull The size of the volume follows the formula X ndash N + Δ

X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data N is the amount of unused blocks within a given LUN

Table 6) Zero fat provisioning

Option Recommended Value Notes

Volume Options

guarantee none No space reservation for volume at all

fractional_reserve 0 With Data ONTAP 733 fractional_reserve can be modified even for volumes without a space guarantee of type volume Prior to Data ONTAP 733 the value was fixed at 100

autosize on Turn autosize on

18 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first volume_grow

Volume Snapshot Options

reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)

schedule switched off

autodelete off Deleting Snapshot copies might be an option when the volume can no longer be resized because the maximum configured size has been reached or when the aggregatersquos free space becomes low

LUN Options

reservation disable No preallocation of blocks for LUN

SUMMARY OF PROVISIONING METHODS

There are good reasons for using any of the provisioning methods already described however full fat for SAN environments should be avoided wherever possible because of the storage efficiency ratio Even with a 100 block usage ratio on primary data zero fat provisioning has many advantages and is the preferred method

bull The aggregatersquos free space is a global pool that can serve space for volumes This gives more flexibility than volumes with their own dedicated free space

bull For SAN volumes the block consumption can be easily monitored bull Deduplication savings go directly into the global pool of free space which is the aggregate or the

resource pool in which it belongs bull Monitoring is needed only on the aggregate level Volumes will grow on demand

Table 7) Comparison of provisioning methods

Characteristics Full Fat Low Fat Zero Fat

Space consumption 2X + Δ X + Δ X ndash N + Δ2

Space efficient

No Partially for Snapshot copies

Yes

Monitoring Optional Required on volume and aggregate level

Required on aggregate level

Notificationmitigation process required

No Optional in most cases Yes

2 N is the traditional thin provisioning impact = amount of blocks logically allocated but not used

19 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Characteristics Full Fat Low Fat Zero Fat

Pool benefitting of dedupe savings

Volume fractional reserve area

Volume free space area Aggregate free space area

Risk of an out of space condition on primary data

No No as long as autodelete is able to delete any Snapshot copies

Yes when monitoring and notification processes are missing

Typical use cases Small installations None or few storage management skills (no monitoring infrastructure)

Large database environments

Shared storage infrastructure Testdev environments Storage pools for virtualized servers

FULLLOWZERO FAT PROVISIONING WITH PROVISIONING MANAGER

NetApp Provisioning Manager focuses on fast and efficient provisioning of storage resources in the NetApp storage infrastructure It speeds up provisioning workflows and boosts capacity usage by using policy-based automation for the entire NetApp NAS and SAN infrastructure These processes are

bull Faster than manually provisioning storage bull Easier to maintain than scripts bull Instrumental in minimizing the risk of data loss resulting from misconfigured storage

Provisioning Manager applies user-defined policies to consistently select the appropriate resources for each provisioning activity This frees administrators from the headache of searching for available space to provision and allows more time for strategic issues The use of a centralized management console allows administrators to monitor the status of their provisioned storage resources Provisioning Manager can help improve business agility and capacity use shrink provisioning time and improve administrator productivity Provisioning Managerrsquos thin provisioning and deduplication capabilities provide a high level of storage efficiency from your NetApp storage investment

A GUI allows you to implement the provisioning models fulllowzero fat within Provisioning Manager See Figure 9 and Figure 10 for configuring storage efficiency in a provisioning policy for NAS and SAN Whenever storage is provisioned using this provisioning policy the settings apply automatically For more information refer to TR-3710 Operations Manager Provisioning Manager and Protection Manager Best Practices Guide

Provisioning Manager encapsulates technical details when provisioning storage and supports an easy integration with existing management tools and orchestration frameworks Policies and their use in so-called datasets and storage services allow you to exploit NetApp storage efficiency technologies without exposing a high level of technical detail to a higher level of management software

Note Provisioning Manager up to version 40 does not allow you to specify autosize and autodelete individually the policy template determines if these features are used and which options are selected In order to implement the provisioning methods outlined a customized ldquoProvisioning Scriptrdquo needs to be provided to set autosize and autodelete parameters according to the recommendations for FullLowZero methods Post provisioning scripts are standard with Provision Manager Use caution when Provisioning Manager runs conformance checks this reverts individual settings

20 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete

21 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat

FULLLOWZERO FAT PROVISIONING FOR STORAGE SERVICES

Storage services are an easy abstraction to provision storage in a utilitylike fashion A storage service describes all characteristic attributes for storage needed in a certain scenario A storage service catalog lists the available templates and allows you to provision storage with these attributes on demand Technically storage services or datasets consist of one or more resource pools a protection policy and a provisioning policy Fulllowzero fat provisioning for storage services is configured in the configuration wizard of the provisioning policy Figure 11 shows the provisioning policies closest to fulllowzero fat configurations

Figure 11) Fulllowzero fat provisioning policies for datasets and storage services

Because this wizard is able to configure the deduplication feature two policies are configured for the zero fat configurations one with deduplication and one without deduplication

22 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

HOW SHOULD A VOLUME BE SIZED

Because physical allocation of data within a zero fat-provisioned volume is done on demand theoretically the volume size can be set to a very high value that can easily keep all application data and Snapshot copies As the unallocated space in the volume is not exclusively reserved for this volume itself all other applications can benefit from the shared pool of unallocated storage However NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand The important advantage is that the commitment rate acts as a metric for data consolidation

Note The commitment rate reflects the amount of logical data consolidation This metric is suitable for deciding when data should be left for organic growth

Additionally the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers

APPLICATION RECOMMENDATIONS

Thin provisioning is most effective when applications use data that is committed to them step by step When applications preformat data the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks Because thin provisioning has no performance penalty the general recommendation is to provision with the zero fat configuration

For SAN-attached storage NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed On Windowsreg platforms this can be configured in NetApp SnapDrivereg

For Oraclereg database best practices refer to WP-7084 Storage Efficiency in an Oracle Environment

32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS

This section deals with provisioning storage for similar applications from a golden template A valid use case is a hosting provider who offers and serves predefined application services in mass quantities Instead of provisioning each application from scratch the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure

When applications are provisioned this way NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space This achieves a high degree of data consolidation and cost savings

The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance CPU and memory

There are two ways to align application data to a NetApp shared storage infrastructure

bull Volume-centric storage layout bull Dedupe-centric storage layout

Depending on the data lifetime suitability for deduplication consistency and tool constraints one way of aligning application data is more appropriate than the other

In both variants the storage of the application template can be provisioned as either full low or zero fat The cloning procedure inherits the attributes of the parent volume To create space-efficient clones the space guarantee must be set to none

23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

CONSEQUENCES FOR MONITORING

When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well

VOLUME-CENTRIC STORAGE LAYOUT

In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies

In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions

bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time

bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data

A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN

Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate

Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases

bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts

24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance 1

Deduplication Block Sharing FlexVol

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance n

FlexClone Block Sharing

Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow

Best Practice

A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance

Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided

bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning

bull Preformatting data

25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

DEDUPE-CENTRIC STORAGE LAYOUT

In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication

This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity

This storage layout provides the following storage efficiency advantages in a short- and long-term perspective

bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns

bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data

In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation

TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment

Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates

26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically

Template

Instance 1

Instance 2

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled

Best Practice

This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication

Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume

We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties

27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION

Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach

The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left

It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to

bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a

limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the

migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval

Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate

Settled

Aggregate

Nomad Nomad

To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor

28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

SLA-BASED ASSESSMENT FOR SETTLEDNOMAD

The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances

We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment

Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances

Figure 15) Alignment by technical impact (sorted by negative impact in descending order)

Settled Nomad

Eg All FC

Instance Inst1 InstN

Medium LowInside SLANeg Impact HighOutside SLA

Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest

Figure 16) Alignment by business impact (sorted by negative impact in descending order)

Settled Nomad

Instance

Neg Impact $$ $

Semi-Settled Nomad

PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION

Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers

29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion

NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations

Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility

ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE

While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted

SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING

In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting

ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS

Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore

In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller

30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise

We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives

Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases

bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously

provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on

These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future

SITUATIONS PUTTING SLA FULFILLMENT AT RISK

Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy

The following list summarize situations that are critical for service delivery

bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time

bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use

bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react

bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this

looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a

ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk

Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are

contained

31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups

41 PHASES AND TRANSITIONS

This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness

bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase

bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase

bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase

42 MONITORING

NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors

Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set

THRESHOLDS

Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation

Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page

32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 17) Operations Manager screen to configure thresholds on operational metrics

For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation

Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data

The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations

bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge

bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification

bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor

bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification

Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state

bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached

bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification

bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality

33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

TRENDING

Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly

Figure 18) Trending of data growth and days-to-full prediction in Operations Manager

Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting

The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates

On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary

Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object

34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME

For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog

MONITORING STORAGE EFFICIENCY RETURNS

NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard

Figure 19) Storage efficiency dashboard in Operations Manager

35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

43 NOTIFICATION

Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low

After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process

Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure

Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP

NOTIFY BY E-MAIL

An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42

NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager

NOTIFY BY SNMP

Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested

36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 20) Configuring an alarm based on the threshold aggregate almost full

Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there

NOTIFY BY SCRIPT

Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line

dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full

37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

44 MITIGATE STORAGE USE

Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor

Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary

When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness

MITIGATION ACTIVITIES FOR AGGREGATES

Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow

1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives

2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero

3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space

4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage

controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability

6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes

7 If none of the listed activities can be used the application must be stopped to achieve a consistent state

The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online

38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Table 8) Mitigation alternatives to control use within aggregates

No Mitigation Activity Repeatability SLA Impact Preparation Time

Time to Show Effect

1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits

None HW procurement

Immediate (+rebalancing)

Data ONTAP 8 High limits

2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate

3 Shrink other volumes in the aggregate if they have enough free space

One time Low None Immediate

4 Run deduplication and shrink volumes Repeatable Low

Time to execute dedupe

Immediate

5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time

6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh

Next planned downtime window

Minutes Volume switch-over time

7 Prevent application data loss and stop the application then migrate (offline)

Repeatable Lowndashhigh Coordinate with app owner

Minutes Migration time

MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS

Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity

Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)

Table 9) Mitigation activities for resource tightness within volumes

No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect

1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate

2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate

3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate

4 Activate FAS deduplication for the volume (requires proper space guarantees) One time

Lowpossible performance impact

Wait for schedule Hours

5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate

Repeatable High Next planned downtime window

Minutes Volume migration time

6 Stop application and migrate data Repeatable High Coordinate wapp owner

Minutes Migration time

39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations

The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum

51 SAMPLE SETTING 1 REAL-LIFE SETTING

This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative

Figure 21) Storage to enable organic data growth between planned downtime windows

Data Data Growth

Planned Downtime Window

Planned Downtime Window

Months Time

Note Several months might fall between planned downtime windows to perform major mitigation alternatives

The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely

Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate

40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth

An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are

bull First metric Aggregate capacity used bull Second metric Aggregate space committed

Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space

Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space

Data Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used

Aggregate Space Committed

0ndash50 gt 65

0ndash110 gt 120

Provisioning New Storage Y

Capacity Assessment Adapt Thresholds

Mitigate

Y Y

Y

Provisioning New Storage Y

Assess Capacity Y

41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

52 SAMPLE SETTING 2 SETTLEDNOMAD

This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online

Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months

Settled

Detecting the Need to Act

Effect of Mitigation (eg migration)

Hours Time

N NN N N

In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on

the individual situation and is calculated against 100

The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases

Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative

Detection Threshold Notify Mitigation

gt 70 Storage operations Stop provisioning of storage

gt 85 Storage operations Stop extending provisioned storage

gt 90 Storage operations Relax resource situation and migrate nomad

42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used

Settled Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used 0ndash70 70ndash85 gt 90

Provisioning New Storage Y

Extending Already Provisioned Storage

Relax UtilizationmdashNetApp Data Motion a Nomad

Y Y

Y

N N N

You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors

43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe

Elapsed Time

Capacity

1 Month 3 Months

Committed Capacity

Overall Trend

Last 3-Month Trend

Capacity Used

1 2 3

As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows

1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window

2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment

3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)

44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not

exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect

b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives

c Determine the growth rate Operations Manager provides help in determining the trend of data growth

d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past

To provision storage following these steps

1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes

2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler

entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there

is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller

c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely

d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off

Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 15: Lun Provision

15 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first - Autodelete is not recommended in most environments

Volume Snapshot Options

reserve yesno The value depends on the number of Snapshot copies and the change rate within the volume Displaying only the committed usable space using SLA is the preferred way to provision NAS storage However there might be situations in which the Snapshot reserve area is omitted (no)

schedule switched on Automatic Snapshot technology schedules

autodelete off Deleting Snapshot copies is not recommended in most NAS environments

SAN

For SAN we consider three options

bull Full fat Both primary data and its Snapshot copy space are preallocated bull Low fat The primary data is preallocated The Snapshot copy space is allocated on demand bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves

the best ratio of storage efficiency when provisioning applications from scratch

Figure 8) Provisioning model for SAN storage from scratch

Primary Data (LUN) Space Allocation

Fat Thin

Full Fat Option No Option

Low Fat Option Zero Fat Option

Fat

Thin

Snapshot Copy Space

Allocation

FULL FAT PROVISIONING

This method can be treated as the historical way of provisioning block storage with Data ONTAP

bull Volumes are created with space guarantee bull A fractional (overwrite) reserve is used to guarantee that the primary data can be overwritten

completely with Snapshot copies in place If this space is not available Snapshot copy creation will fail

bull The size of the volume follows the formula 2X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data

16 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The enhancements to the volume autosize capabilities (such as volume size-dependent thresholds) and the robustness of the Snapshot feature autodelete implementation have made full fat provisioning more or less obsolete As of today the default settings for creating volumesLUNs in Data ONTAP still apply to these settings See Provisioning from Scratch Full Fat to Zero Fat Provisioning for a discussion of using tools such as Provisioning Manager

Table 4) Full fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 100 Even technically possible a fractional reserve below 100 incorporates a potential risk to run out of Snapshot copy overwrite space This situation should be avoided

autosize off Autosize could be used as an option to create free space needed for Snapshot copy creation

Volume Snapshot Options

reserve 0

schedule switched off

autodelete off

LUN Options

reservation enable

LOW FAT PROVISIONING

With low fat provisioning we use a more space-efficient way to provision volumes

bull Volumes are created with space guarantee bull LUNs are created with space guarantee as well This setup does not benefit from unused blocks with

a LUN (During the lifetime of a LUN the amount of free unused blocks typically decreases Without space reclamation techniques allocated blocks on the storage system stay allocated)

bull The size of the volume follows the formula X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data

bull Because space used for Snapshot copies might grow unexpectedly the autosize and autodelete policies are used to make space available when reaching a preset volume threshold

Table 5) Low fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 0 Snapshot space is controlled by autodelete and autosize options

autosize on Turn autosize on

17 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first volume_grow Increasing the size of the volume does not destroy any data or information There is no reason not to increase the size of the volume It can be reverted afterward if the volume free space increases again There might be configurations where automatic volume growth is not desired

Volume Snapshot Options

reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration of Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)

schedule switched off

autodelete on There might be Snapshot copies that are needed to fulfill certain SLAs such as backup SLAs Setting this policy needs to be negotiated with the business requirements In the worst case scenario deleting Snapshot copies is not an option

autodelete options volume oldest_first There is a precedent for Snapshot copies being a candidate for deletion oldest_first is the current default

LUN Options

reservation enable Reserves space for the LUN during creation

ZERO FAT PROVISIONING

Full and low fat provisioning use fully allocated volumes and LUNs Per-default space allocation happens within the boundaries of the LUN and the volume Zero fat follows a 100 ldquoallocate on demandrdquo concept

bull Volumes are created without space guarantee bull LUNs are created without space guarantee bull The size of the volume follows the formula X ndash N + Δ

X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data N is the amount of unused blocks within a given LUN

Table 6) Zero fat provisioning

Option Recommended Value Notes

Volume Options

guarantee none No space reservation for volume at all

fractional_reserve 0 With Data ONTAP 733 fractional_reserve can be modified even for volumes without a space guarantee of type volume Prior to Data ONTAP 733 the value was fixed at 100

autosize on Turn autosize on

18 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first volume_grow

Volume Snapshot Options

reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)

schedule switched off

autodelete off Deleting Snapshot copies might be an option when the volume can no longer be resized because the maximum configured size has been reached or when the aggregatersquos free space becomes low

LUN Options

reservation disable No preallocation of blocks for LUN

SUMMARY OF PROVISIONING METHODS

There are good reasons for using any of the provisioning methods already described however full fat for SAN environments should be avoided wherever possible because of the storage efficiency ratio Even with a 100 block usage ratio on primary data zero fat provisioning has many advantages and is the preferred method

bull The aggregatersquos free space is a global pool that can serve space for volumes This gives more flexibility than volumes with their own dedicated free space

bull For SAN volumes the block consumption can be easily monitored bull Deduplication savings go directly into the global pool of free space which is the aggregate or the

resource pool in which it belongs bull Monitoring is needed only on the aggregate level Volumes will grow on demand

Table 7) Comparison of provisioning methods

Characteristics Full Fat Low Fat Zero Fat

Space consumption 2X + Δ X + Δ X ndash N + Δ2

Space efficient

No Partially for Snapshot copies

Yes

Monitoring Optional Required on volume and aggregate level

Required on aggregate level

Notificationmitigation process required

No Optional in most cases Yes

2 N is the traditional thin provisioning impact = amount of blocks logically allocated but not used

19 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Characteristics Full Fat Low Fat Zero Fat

Pool benefitting of dedupe savings

Volume fractional reserve area

Volume free space area Aggregate free space area

Risk of an out of space condition on primary data

No No as long as autodelete is able to delete any Snapshot copies

Yes when monitoring and notification processes are missing

Typical use cases Small installations None or few storage management skills (no monitoring infrastructure)

Large database environments

Shared storage infrastructure Testdev environments Storage pools for virtualized servers

FULLLOWZERO FAT PROVISIONING WITH PROVISIONING MANAGER

NetApp Provisioning Manager focuses on fast and efficient provisioning of storage resources in the NetApp storage infrastructure It speeds up provisioning workflows and boosts capacity usage by using policy-based automation for the entire NetApp NAS and SAN infrastructure These processes are

bull Faster than manually provisioning storage bull Easier to maintain than scripts bull Instrumental in minimizing the risk of data loss resulting from misconfigured storage

Provisioning Manager applies user-defined policies to consistently select the appropriate resources for each provisioning activity This frees administrators from the headache of searching for available space to provision and allows more time for strategic issues The use of a centralized management console allows administrators to monitor the status of their provisioned storage resources Provisioning Manager can help improve business agility and capacity use shrink provisioning time and improve administrator productivity Provisioning Managerrsquos thin provisioning and deduplication capabilities provide a high level of storage efficiency from your NetApp storage investment

A GUI allows you to implement the provisioning models fulllowzero fat within Provisioning Manager See Figure 9 and Figure 10 for configuring storage efficiency in a provisioning policy for NAS and SAN Whenever storage is provisioned using this provisioning policy the settings apply automatically For more information refer to TR-3710 Operations Manager Provisioning Manager and Protection Manager Best Practices Guide

Provisioning Manager encapsulates technical details when provisioning storage and supports an easy integration with existing management tools and orchestration frameworks Policies and their use in so-called datasets and storage services allow you to exploit NetApp storage efficiency technologies without exposing a high level of technical detail to a higher level of management software

Note Provisioning Manager up to version 40 does not allow you to specify autosize and autodelete individually the policy template determines if these features are used and which options are selected In order to implement the provisioning methods outlined a customized ldquoProvisioning Scriptrdquo needs to be provided to set autosize and autodelete parameters according to the recommendations for FullLowZero methods Post provisioning scripts are standard with Provision Manager Use caution when Provisioning Manager runs conformance checks this reverts individual settings

20 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete

21 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat

FULLLOWZERO FAT PROVISIONING FOR STORAGE SERVICES

Storage services are an easy abstraction to provision storage in a utilitylike fashion A storage service describes all characteristic attributes for storage needed in a certain scenario A storage service catalog lists the available templates and allows you to provision storage with these attributes on demand Technically storage services or datasets consist of one or more resource pools a protection policy and a provisioning policy Fulllowzero fat provisioning for storage services is configured in the configuration wizard of the provisioning policy Figure 11 shows the provisioning policies closest to fulllowzero fat configurations

Figure 11) Fulllowzero fat provisioning policies for datasets and storage services

Because this wizard is able to configure the deduplication feature two policies are configured for the zero fat configurations one with deduplication and one without deduplication

22 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

HOW SHOULD A VOLUME BE SIZED

Because physical allocation of data within a zero fat-provisioned volume is done on demand theoretically the volume size can be set to a very high value that can easily keep all application data and Snapshot copies As the unallocated space in the volume is not exclusively reserved for this volume itself all other applications can benefit from the shared pool of unallocated storage However NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand The important advantage is that the commitment rate acts as a metric for data consolidation

Note The commitment rate reflects the amount of logical data consolidation This metric is suitable for deciding when data should be left for organic growth

Additionally the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers

APPLICATION RECOMMENDATIONS

Thin provisioning is most effective when applications use data that is committed to them step by step When applications preformat data the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks Because thin provisioning has no performance penalty the general recommendation is to provision with the zero fat configuration

For SAN-attached storage NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed On Windowsreg platforms this can be configured in NetApp SnapDrivereg

For Oraclereg database best practices refer to WP-7084 Storage Efficiency in an Oracle Environment

32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS

This section deals with provisioning storage for similar applications from a golden template A valid use case is a hosting provider who offers and serves predefined application services in mass quantities Instead of provisioning each application from scratch the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure

When applications are provisioned this way NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space This achieves a high degree of data consolidation and cost savings

The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance CPU and memory

There are two ways to align application data to a NetApp shared storage infrastructure

bull Volume-centric storage layout bull Dedupe-centric storage layout

Depending on the data lifetime suitability for deduplication consistency and tool constraints one way of aligning application data is more appropriate than the other

In both variants the storage of the application template can be provisioned as either full low or zero fat The cloning procedure inherits the attributes of the parent volume To create space-efficient clones the space guarantee must be set to none

23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

CONSEQUENCES FOR MONITORING

When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well

VOLUME-CENTRIC STORAGE LAYOUT

In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies

In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions

bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time

bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data

A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN

Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate

Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases

bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts

24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance 1

Deduplication Block Sharing FlexVol

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance n

FlexClone Block Sharing

Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow

Best Practice

A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance

Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided

bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning

bull Preformatting data

25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

DEDUPE-CENTRIC STORAGE LAYOUT

In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication

This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity

This storage layout provides the following storage efficiency advantages in a short- and long-term perspective

bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns

bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data

In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation

TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment

Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates

26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically

Template

Instance 1

Instance 2

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled

Best Practice

This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication

Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume

We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties

27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION

Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach

The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left

It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to

bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a

limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the

migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval

Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate

Settled

Aggregate

Nomad Nomad

To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor

28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

SLA-BASED ASSESSMENT FOR SETTLEDNOMAD

The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances

We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment

Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances

Figure 15) Alignment by technical impact (sorted by negative impact in descending order)

Settled Nomad

Eg All FC

Instance Inst1 InstN

Medium LowInside SLANeg Impact HighOutside SLA

Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest

Figure 16) Alignment by business impact (sorted by negative impact in descending order)

Settled Nomad

Instance

Neg Impact $$ $

Semi-Settled Nomad

PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION

Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers

29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion

NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations

Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility

ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE

While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted

SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING

In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting

ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS

Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore

In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller

30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise

We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives

Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases

bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously

provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on

These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future

SITUATIONS PUTTING SLA FULFILLMENT AT RISK

Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy

The following list summarize situations that are critical for service delivery

bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time

bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use

bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react

bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this

looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a

ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk

Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are

contained

31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups

41 PHASES AND TRANSITIONS

This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness

bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase

bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase

bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase

42 MONITORING

NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors

Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set

THRESHOLDS

Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation

Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page

32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 17) Operations Manager screen to configure thresholds on operational metrics

For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation

Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data

The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations

bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge

bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification

bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor

bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification

Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state

bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached

bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification

bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality

33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

TRENDING

Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly

Figure 18) Trending of data growth and days-to-full prediction in Operations Manager

Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting

The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates

On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary

Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object

34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME

For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog

MONITORING STORAGE EFFICIENCY RETURNS

NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard

Figure 19) Storage efficiency dashboard in Operations Manager

35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

43 NOTIFICATION

Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low

After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process

Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure

Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP

NOTIFY BY E-MAIL

An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42

NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager

NOTIFY BY SNMP

Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested

36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 20) Configuring an alarm based on the threshold aggregate almost full

Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there

NOTIFY BY SCRIPT

Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line

dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full

37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

44 MITIGATE STORAGE USE

Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor

Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary

When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness

MITIGATION ACTIVITIES FOR AGGREGATES

Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow

1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives

2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero

3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space

4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage

controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability

6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes

7 If none of the listed activities can be used the application must be stopped to achieve a consistent state

The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online

38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Table 8) Mitigation alternatives to control use within aggregates

No Mitigation Activity Repeatability SLA Impact Preparation Time

Time to Show Effect

1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits

None HW procurement

Immediate (+rebalancing)

Data ONTAP 8 High limits

2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate

3 Shrink other volumes in the aggregate if they have enough free space

One time Low None Immediate

4 Run deduplication and shrink volumes Repeatable Low

Time to execute dedupe

Immediate

5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time

6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh

Next planned downtime window

Minutes Volume switch-over time

7 Prevent application data loss and stop the application then migrate (offline)

Repeatable Lowndashhigh Coordinate with app owner

Minutes Migration time

MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS

Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity

Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)

Table 9) Mitigation activities for resource tightness within volumes

No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect

1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate

2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate

3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate

4 Activate FAS deduplication for the volume (requires proper space guarantees) One time

Lowpossible performance impact

Wait for schedule Hours

5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate

Repeatable High Next planned downtime window

Minutes Volume migration time

6 Stop application and migrate data Repeatable High Coordinate wapp owner

Minutes Migration time

39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations

The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum

51 SAMPLE SETTING 1 REAL-LIFE SETTING

This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative

Figure 21) Storage to enable organic data growth between planned downtime windows

Data Data Growth

Planned Downtime Window

Planned Downtime Window

Months Time

Note Several months might fall between planned downtime windows to perform major mitigation alternatives

The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely

Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate

40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth

An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are

bull First metric Aggregate capacity used bull Second metric Aggregate space committed

Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space

Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space

Data Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used

Aggregate Space Committed

0ndash50 gt 65

0ndash110 gt 120

Provisioning New Storage Y

Capacity Assessment Adapt Thresholds

Mitigate

Y Y

Y

Provisioning New Storage Y

Assess Capacity Y

41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

52 SAMPLE SETTING 2 SETTLEDNOMAD

This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online

Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months

Settled

Detecting the Need to Act

Effect of Mitigation (eg migration)

Hours Time

N NN N N

In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on

the individual situation and is calculated against 100

The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases

Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative

Detection Threshold Notify Mitigation

gt 70 Storage operations Stop provisioning of storage

gt 85 Storage operations Stop extending provisioned storage

gt 90 Storage operations Relax resource situation and migrate nomad

42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used

Settled Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used 0ndash70 70ndash85 gt 90

Provisioning New Storage Y

Extending Already Provisioned Storage

Relax UtilizationmdashNetApp Data Motion a Nomad

Y Y

Y

N N N

You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors

43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe

Elapsed Time

Capacity

1 Month 3 Months

Committed Capacity

Overall Trend

Last 3-Month Trend

Capacity Used

1 2 3

As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows

1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window

2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment

3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)

44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not

exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect

b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives

c Determine the growth rate Operations Manager provides help in determining the trend of data growth

d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past

To provision storage following these steps

1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes

2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler

entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there

is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller

c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely

d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off

Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 16: Lun Provision

16 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The enhancements to the volume autosize capabilities (such as volume size-dependent thresholds) and the robustness of the Snapshot feature autodelete implementation have made full fat provisioning more or less obsolete As of today the default settings for creating volumesLUNs in Data ONTAP still apply to these settings See Provisioning from Scratch Full Fat to Zero Fat Provisioning for a discussion of using tools such as Provisioning Manager

Table 4) Full fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 100 Even technically possible a fractional reserve below 100 incorporates a potential risk to run out of Snapshot copy overwrite space This situation should be avoided

autosize off Autosize could be used as an option to create free space needed for Snapshot copy creation

Volume Snapshot Options

reserve 0

schedule switched off

autodelete off

LUN Options

reservation enable

LOW FAT PROVISIONING

With low fat provisioning we use a more space-efficient way to provision volumes

bull Volumes are created with space guarantee bull LUNs are created with space guarantee as well This setup does not benefit from unused blocks with

a LUN (During the lifetime of a LUN the amount of free unused blocks typically decreases Without space reclamation techniques allocated blocks on the storage system stay allocated)

bull The size of the volume follows the formula X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data

bull Because space used for Snapshot copies might grow unexpectedly the autosize and autodelete policies are used to make space available when reaching a preset volume threshold

Table 5) Low fat provisioning

Option Recommended Value Notes

Volume Options

guarantee volume

fractional_reserve 0 Snapshot space is controlled by autodelete and autosize options

autosize on Turn autosize on

17 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first volume_grow Increasing the size of the volume does not destroy any data or information There is no reason not to increase the size of the volume It can be reverted afterward if the volume free space increases again There might be configurations where automatic volume growth is not desired

Volume Snapshot Options

reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration of Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)

schedule switched off

autodelete on There might be Snapshot copies that are needed to fulfill certain SLAs such as backup SLAs Setting this policy needs to be negotiated with the business requirements In the worst case scenario deleting Snapshot copies is not an option

autodelete options volume oldest_first There is a precedent for Snapshot copies being a candidate for deletion oldest_first is the current default

LUN Options

reservation enable Reserves space for the LUN during creation

ZERO FAT PROVISIONING

Full and low fat provisioning use fully allocated volumes and LUNs Per-default space allocation happens within the boundaries of the LUN and the volume Zero fat follows a 100 ldquoallocate on demandrdquo concept

bull Volumes are created without space guarantee bull LUNs are created without space guarantee bull The size of the volume follows the formula X ndash N + Δ

X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data N is the amount of unused blocks within a given LUN

Table 6) Zero fat provisioning

Option Recommended Value Notes

Volume Options

guarantee none No space reservation for volume at all

fractional_reserve 0 With Data ONTAP 733 fractional_reserve can be modified even for volumes without a space guarantee of type volume Prior to Data ONTAP 733 the value was fixed at 100

autosize on Turn autosize on

18 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first volume_grow

Volume Snapshot Options

reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)

schedule switched off

autodelete off Deleting Snapshot copies might be an option when the volume can no longer be resized because the maximum configured size has been reached or when the aggregatersquos free space becomes low

LUN Options

reservation disable No preallocation of blocks for LUN

SUMMARY OF PROVISIONING METHODS

There are good reasons for using any of the provisioning methods already described however full fat for SAN environments should be avoided wherever possible because of the storage efficiency ratio Even with a 100 block usage ratio on primary data zero fat provisioning has many advantages and is the preferred method

bull The aggregatersquos free space is a global pool that can serve space for volumes This gives more flexibility than volumes with their own dedicated free space

bull For SAN volumes the block consumption can be easily monitored bull Deduplication savings go directly into the global pool of free space which is the aggregate or the

resource pool in which it belongs bull Monitoring is needed only on the aggregate level Volumes will grow on demand

Table 7) Comparison of provisioning methods

Characteristics Full Fat Low Fat Zero Fat

Space consumption 2X + Δ X + Δ X ndash N + Δ2

Space efficient

No Partially for Snapshot copies

Yes

Monitoring Optional Required on volume and aggregate level

Required on aggregate level

Notificationmitigation process required

No Optional in most cases Yes

2 N is the traditional thin provisioning impact = amount of blocks logically allocated but not used

19 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Characteristics Full Fat Low Fat Zero Fat

Pool benefitting of dedupe savings

Volume fractional reserve area

Volume free space area Aggregate free space area

Risk of an out of space condition on primary data

No No as long as autodelete is able to delete any Snapshot copies

Yes when monitoring and notification processes are missing

Typical use cases Small installations None or few storage management skills (no monitoring infrastructure)

Large database environments

Shared storage infrastructure Testdev environments Storage pools for virtualized servers

FULLLOWZERO FAT PROVISIONING WITH PROVISIONING MANAGER

NetApp Provisioning Manager focuses on fast and efficient provisioning of storage resources in the NetApp storage infrastructure It speeds up provisioning workflows and boosts capacity usage by using policy-based automation for the entire NetApp NAS and SAN infrastructure These processes are

bull Faster than manually provisioning storage bull Easier to maintain than scripts bull Instrumental in minimizing the risk of data loss resulting from misconfigured storage

Provisioning Manager applies user-defined policies to consistently select the appropriate resources for each provisioning activity This frees administrators from the headache of searching for available space to provision and allows more time for strategic issues The use of a centralized management console allows administrators to monitor the status of their provisioned storage resources Provisioning Manager can help improve business agility and capacity use shrink provisioning time and improve administrator productivity Provisioning Managerrsquos thin provisioning and deduplication capabilities provide a high level of storage efficiency from your NetApp storage investment

A GUI allows you to implement the provisioning models fulllowzero fat within Provisioning Manager See Figure 9 and Figure 10 for configuring storage efficiency in a provisioning policy for NAS and SAN Whenever storage is provisioned using this provisioning policy the settings apply automatically For more information refer to TR-3710 Operations Manager Provisioning Manager and Protection Manager Best Practices Guide

Provisioning Manager encapsulates technical details when provisioning storage and supports an easy integration with existing management tools and orchestration frameworks Policies and their use in so-called datasets and storage services allow you to exploit NetApp storage efficiency technologies without exposing a high level of technical detail to a higher level of management software

Note Provisioning Manager up to version 40 does not allow you to specify autosize and autodelete individually the policy template determines if these features are used and which options are selected In order to implement the provisioning methods outlined a customized ldquoProvisioning Scriptrdquo needs to be provided to set autosize and autodelete parameters according to the recommendations for FullLowZero methods Post provisioning scripts are standard with Provision Manager Use caution when Provisioning Manager runs conformance checks this reverts individual settings

20 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete

21 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat

FULLLOWZERO FAT PROVISIONING FOR STORAGE SERVICES

Storage services are an easy abstraction to provision storage in a utilitylike fashion A storage service describes all characteristic attributes for storage needed in a certain scenario A storage service catalog lists the available templates and allows you to provision storage with these attributes on demand Technically storage services or datasets consist of one or more resource pools a protection policy and a provisioning policy Fulllowzero fat provisioning for storage services is configured in the configuration wizard of the provisioning policy Figure 11 shows the provisioning policies closest to fulllowzero fat configurations

Figure 11) Fulllowzero fat provisioning policies for datasets and storage services

Because this wizard is able to configure the deduplication feature two policies are configured for the zero fat configurations one with deduplication and one without deduplication

22 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

HOW SHOULD A VOLUME BE SIZED

Because physical allocation of data within a zero fat-provisioned volume is done on demand theoretically the volume size can be set to a very high value that can easily keep all application data and Snapshot copies As the unallocated space in the volume is not exclusively reserved for this volume itself all other applications can benefit from the shared pool of unallocated storage However NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand The important advantage is that the commitment rate acts as a metric for data consolidation

Note The commitment rate reflects the amount of logical data consolidation This metric is suitable for deciding when data should be left for organic growth

Additionally the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers

APPLICATION RECOMMENDATIONS

Thin provisioning is most effective when applications use data that is committed to them step by step When applications preformat data the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks Because thin provisioning has no performance penalty the general recommendation is to provision with the zero fat configuration

For SAN-attached storage NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed On Windowsreg platforms this can be configured in NetApp SnapDrivereg

For Oraclereg database best practices refer to WP-7084 Storage Efficiency in an Oracle Environment

32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS

This section deals with provisioning storage for similar applications from a golden template A valid use case is a hosting provider who offers and serves predefined application services in mass quantities Instead of provisioning each application from scratch the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure

When applications are provisioned this way NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space This achieves a high degree of data consolidation and cost savings

The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance CPU and memory

There are two ways to align application data to a NetApp shared storage infrastructure

bull Volume-centric storage layout bull Dedupe-centric storage layout

Depending on the data lifetime suitability for deduplication consistency and tool constraints one way of aligning application data is more appropriate than the other

In both variants the storage of the application template can be provisioned as either full low or zero fat The cloning procedure inherits the attributes of the parent volume To create space-efficient clones the space guarantee must be set to none

23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

CONSEQUENCES FOR MONITORING

When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well

VOLUME-CENTRIC STORAGE LAYOUT

In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies

In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions

bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time

bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data

A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN

Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate

Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases

bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts

24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance 1

Deduplication Block Sharing FlexVol

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance n

FlexClone Block Sharing

Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow

Best Practice

A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance

Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided

bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning

bull Preformatting data

25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

DEDUPE-CENTRIC STORAGE LAYOUT

In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication

This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity

This storage layout provides the following storage efficiency advantages in a short- and long-term perspective

bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns

bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data

In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation

TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment

Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates

26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically

Template

Instance 1

Instance 2

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled

Best Practice

This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication

Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume

We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties

27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION

Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach

The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left

It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to

bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a

limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the

migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval

Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate

Settled

Aggregate

Nomad Nomad

To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor

28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

SLA-BASED ASSESSMENT FOR SETTLEDNOMAD

The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances

We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment

Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances

Figure 15) Alignment by technical impact (sorted by negative impact in descending order)

Settled Nomad

Eg All FC

Instance Inst1 InstN

Medium LowInside SLANeg Impact HighOutside SLA

Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest

Figure 16) Alignment by business impact (sorted by negative impact in descending order)

Settled Nomad

Instance

Neg Impact $$ $

Semi-Settled Nomad

PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION

Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers

29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion

NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations

Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility

ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE

While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted

SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING

In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting

ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS

Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore

In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller

30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise

We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives

Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases

bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously

provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on

These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future

SITUATIONS PUTTING SLA FULFILLMENT AT RISK

Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy

The following list summarize situations that are critical for service delivery

bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time

bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use

bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react

bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this

looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a

ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk

Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are

contained

31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups

41 PHASES AND TRANSITIONS

This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness

bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase

bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase

bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase

42 MONITORING

NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors

Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set

THRESHOLDS

Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation

Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page

32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 17) Operations Manager screen to configure thresholds on operational metrics

For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation

Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data

The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations

bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge

bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification

bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor

bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification

Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state

bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached

bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification

bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality

33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

TRENDING

Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly

Figure 18) Trending of data growth and days-to-full prediction in Operations Manager

Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting

The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates

On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary

Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object

34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME

For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog

MONITORING STORAGE EFFICIENCY RETURNS

NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard

Figure 19) Storage efficiency dashboard in Operations Manager

35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

43 NOTIFICATION

Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low

After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process

Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure

Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP

NOTIFY BY E-MAIL

An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42

NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager

NOTIFY BY SNMP

Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested

36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 20) Configuring an alarm based on the threshold aggregate almost full

Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there

NOTIFY BY SCRIPT

Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line

dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full

37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

44 MITIGATE STORAGE USE

Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor

Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary

When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness

MITIGATION ACTIVITIES FOR AGGREGATES

Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow

1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives

2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero

3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space

4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage

controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability

6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes

7 If none of the listed activities can be used the application must be stopped to achieve a consistent state

The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online

38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Table 8) Mitigation alternatives to control use within aggregates

No Mitigation Activity Repeatability SLA Impact Preparation Time

Time to Show Effect

1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits

None HW procurement

Immediate (+rebalancing)

Data ONTAP 8 High limits

2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate

3 Shrink other volumes in the aggregate if they have enough free space

One time Low None Immediate

4 Run deduplication and shrink volumes Repeatable Low

Time to execute dedupe

Immediate

5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time

6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh

Next planned downtime window

Minutes Volume switch-over time

7 Prevent application data loss and stop the application then migrate (offline)

Repeatable Lowndashhigh Coordinate with app owner

Minutes Migration time

MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS

Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity

Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)

Table 9) Mitigation activities for resource tightness within volumes

No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect

1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate

2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate

3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate

4 Activate FAS deduplication for the volume (requires proper space guarantees) One time

Lowpossible performance impact

Wait for schedule Hours

5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate

Repeatable High Next planned downtime window

Minutes Volume migration time

6 Stop application and migrate data Repeatable High Coordinate wapp owner

Minutes Migration time

39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations

The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum

51 SAMPLE SETTING 1 REAL-LIFE SETTING

This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative

Figure 21) Storage to enable organic data growth between planned downtime windows

Data Data Growth

Planned Downtime Window

Planned Downtime Window

Months Time

Note Several months might fall between planned downtime windows to perform major mitigation alternatives

The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely

Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate

40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth

An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are

bull First metric Aggregate capacity used bull Second metric Aggregate space committed

Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space

Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space

Data Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used

Aggregate Space Committed

0ndash50 gt 65

0ndash110 gt 120

Provisioning New Storage Y

Capacity Assessment Adapt Thresholds

Mitigate

Y Y

Y

Provisioning New Storage Y

Assess Capacity Y

41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

52 SAMPLE SETTING 2 SETTLEDNOMAD

This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online

Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months

Settled

Detecting the Need to Act

Effect of Mitigation (eg migration)

Hours Time

N NN N N

In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on

the individual situation and is calculated against 100

The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases

Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative

Detection Threshold Notify Mitigation

gt 70 Storage operations Stop provisioning of storage

gt 85 Storage operations Stop extending provisioned storage

gt 90 Storage operations Relax resource situation and migrate nomad

42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used

Settled Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used 0ndash70 70ndash85 gt 90

Provisioning New Storage Y

Extending Already Provisioned Storage

Relax UtilizationmdashNetApp Data Motion a Nomad

Y Y

Y

N N N

You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors

43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe

Elapsed Time

Capacity

1 Month 3 Months

Committed Capacity

Overall Trend

Last 3-Month Trend

Capacity Used

1 2 3

As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows

1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window

2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment

3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)

44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not

exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect

b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives

c Determine the growth rate Operations Manager provides help in determining the trend of data growth

d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past

To provision storage following these steps

1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes

2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler

entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there

is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller

c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely

d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off

Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 17: Lun Provision

17 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first volume_grow Increasing the size of the volume does not destroy any data or information There is no reason not to increase the size of the volume It can be reverted afterward if the volume free space increases again There might be configurations where automatic volume growth is not desired

Volume Snapshot Options

reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration of Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)

schedule switched off

autodelete on There might be Snapshot copies that are needed to fulfill certain SLAs such as backup SLAs Setting this policy needs to be negotiated with the business requirements In the worst case scenario deleting Snapshot copies is not an option

autodelete options volume oldest_first There is a precedent for Snapshot copies being a candidate for deletion oldest_first is the current default

LUN Options

reservation enable Reserves space for the LUN during creation

ZERO FAT PROVISIONING

Full and low fat provisioning use fully allocated volumes and LUNs Per-default space allocation happens within the boundaries of the LUN and the volume Zero fat follows a 100 ldquoallocate on demandrdquo concept

bull Volumes are created without space guarantee bull LUNs are created without space guarantee bull The size of the volume follows the formula X ndash N + Δ

X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data N is the amount of unused blocks within a given LUN

Table 6) Zero fat provisioning

Option Recommended Value Notes

Volume Options

guarantee none No space reservation for volume at all

fractional_reserve 0 With Data ONTAP 733 fractional_reserve can be modified even for volumes without a space guarantee of type volume Prior to Data ONTAP 733 the value was fixed at 100

autosize on Turn autosize on

18 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first volume_grow

Volume Snapshot Options

reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)

schedule switched off

autodelete off Deleting Snapshot copies might be an option when the volume can no longer be resized because the maximum configured size has been reached or when the aggregatersquos free space becomes low

LUN Options

reservation disable No preallocation of blocks for LUN

SUMMARY OF PROVISIONING METHODS

There are good reasons for using any of the provisioning methods already described however full fat for SAN environments should be avoided wherever possible because of the storage efficiency ratio Even with a 100 block usage ratio on primary data zero fat provisioning has many advantages and is the preferred method

bull The aggregatersquos free space is a global pool that can serve space for volumes This gives more flexibility than volumes with their own dedicated free space

bull For SAN volumes the block consumption can be easily monitored bull Deduplication savings go directly into the global pool of free space which is the aggregate or the

resource pool in which it belongs bull Monitoring is needed only on the aggregate level Volumes will grow on demand

Table 7) Comparison of provisioning methods

Characteristics Full Fat Low Fat Zero Fat

Space consumption 2X + Δ X + Δ X ndash N + Δ2

Space efficient

No Partially for Snapshot copies

Yes

Monitoring Optional Required on volume and aggregate level

Required on aggregate level

Notificationmitigation process required

No Optional in most cases Yes

2 N is the traditional thin provisioning impact = amount of blocks logically allocated but not used

19 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Characteristics Full Fat Low Fat Zero Fat

Pool benefitting of dedupe savings

Volume fractional reserve area

Volume free space area Aggregate free space area

Risk of an out of space condition on primary data

No No as long as autodelete is able to delete any Snapshot copies

Yes when monitoring and notification processes are missing

Typical use cases Small installations None or few storage management skills (no monitoring infrastructure)

Large database environments

Shared storage infrastructure Testdev environments Storage pools for virtualized servers

FULLLOWZERO FAT PROVISIONING WITH PROVISIONING MANAGER

NetApp Provisioning Manager focuses on fast and efficient provisioning of storage resources in the NetApp storage infrastructure It speeds up provisioning workflows and boosts capacity usage by using policy-based automation for the entire NetApp NAS and SAN infrastructure These processes are

bull Faster than manually provisioning storage bull Easier to maintain than scripts bull Instrumental in minimizing the risk of data loss resulting from misconfigured storage

Provisioning Manager applies user-defined policies to consistently select the appropriate resources for each provisioning activity This frees administrators from the headache of searching for available space to provision and allows more time for strategic issues The use of a centralized management console allows administrators to monitor the status of their provisioned storage resources Provisioning Manager can help improve business agility and capacity use shrink provisioning time and improve administrator productivity Provisioning Managerrsquos thin provisioning and deduplication capabilities provide a high level of storage efficiency from your NetApp storage investment

A GUI allows you to implement the provisioning models fulllowzero fat within Provisioning Manager See Figure 9 and Figure 10 for configuring storage efficiency in a provisioning policy for NAS and SAN Whenever storage is provisioned using this provisioning policy the settings apply automatically For more information refer to TR-3710 Operations Manager Provisioning Manager and Protection Manager Best Practices Guide

Provisioning Manager encapsulates technical details when provisioning storage and supports an easy integration with existing management tools and orchestration frameworks Policies and their use in so-called datasets and storage services allow you to exploit NetApp storage efficiency technologies without exposing a high level of technical detail to a higher level of management software

Note Provisioning Manager up to version 40 does not allow you to specify autosize and autodelete individually the policy template determines if these features are used and which options are selected In order to implement the provisioning methods outlined a customized ldquoProvisioning Scriptrdquo needs to be provided to set autosize and autodelete parameters according to the recommendations for FullLowZero methods Post provisioning scripts are standard with Provision Manager Use caution when Provisioning Manager runs conformance checks this reverts individual settings

20 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete

21 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat

FULLLOWZERO FAT PROVISIONING FOR STORAGE SERVICES

Storage services are an easy abstraction to provision storage in a utilitylike fashion A storage service describes all characteristic attributes for storage needed in a certain scenario A storage service catalog lists the available templates and allows you to provision storage with these attributes on demand Technically storage services or datasets consist of one or more resource pools a protection policy and a provisioning policy Fulllowzero fat provisioning for storage services is configured in the configuration wizard of the provisioning policy Figure 11 shows the provisioning policies closest to fulllowzero fat configurations

Figure 11) Fulllowzero fat provisioning policies for datasets and storage services

Because this wizard is able to configure the deduplication feature two policies are configured for the zero fat configurations one with deduplication and one without deduplication

22 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

HOW SHOULD A VOLUME BE SIZED

Because physical allocation of data within a zero fat-provisioned volume is done on demand theoretically the volume size can be set to a very high value that can easily keep all application data and Snapshot copies As the unallocated space in the volume is not exclusively reserved for this volume itself all other applications can benefit from the shared pool of unallocated storage However NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand The important advantage is that the commitment rate acts as a metric for data consolidation

Note The commitment rate reflects the amount of logical data consolidation This metric is suitable for deciding when data should be left for organic growth

Additionally the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers

APPLICATION RECOMMENDATIONS

Thin provisioning is most effective when applications use data that is committed to them step by step When applications preformat data the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks Because thin provisioning has no performance penalty the general recommendation is to provision with the zero fat configuration

For SAN-attached storage NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed On Windowsreg platforms this can be configured in NetApp SnapDrivereg

For Oraclereg database best practices refer to WP-7084 Storage Efficiency in an Oracle Environment

32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS

This section deals with provisioning storage for similar applications from a golden template A valid use case is a hosting provider who offers and serves predefined application services in mass quantities Instead of provisioning each application from scratch the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure

When applications are provisioned this way NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space This achieves a high degree of data consolidation and cost savings

The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance CPU and memory

There are two ways to align application data to a NetApp shared storage infrastructure

bull Volume-centric storage layout bull Dedupe-centric storage layout

Depending on the data lifetime suitability for deduplication consistency and tool constraints one way of aligning application data is more appropriate than the other

In both variants the storage of the application template can be provisioned as either full low or zero fat The cloning procedure inherits the attributes of the parent volume To create space-efficient clones the space guarantee must be set to none

23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

CONSEQUENCES FOR MONITORING

When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well

VOLUME-CENTRIC STORAGE LAYOUT

In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies

In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions

bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time

bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data

A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN

Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate

Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases

bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts

24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance 1

Deduplication Block Sharing FlexVol

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance n

FlexClone Block Sharing

Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow

Best Practice

A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance

Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided

bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning

bull Preformatting data

25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

DEDUPE-CENTRIC STORAGE LAYOUT

In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication

This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity

This storage layout provides the following storage efficiency advantages in a short- and long-term perspective

bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns

bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data

In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation

TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment

Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates

26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically

Template

Instance 1

Instance 2

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled

Best Practice

This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication

Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume

We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties

27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION

Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach

The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left

It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to

bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a

limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the

migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval

Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate

Settled

Aggregate

Nomad Nomad

To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor

28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

SLA-BASED ASSESSMENT FOR SETTLEDNOMAD

The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances

We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment

Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances

Figure 15) Alignment by technical impact (sorted by negative impact in descending order)

Settled Nomad

Eg All FC

Instance Inst1 InstN

Medium LowInside SLANeg Impact HighOutside SLA

Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest

Figure 16) Alignment by business impact (sorted by negative impact in descending order)

Settled Nomad

Instance

Neg Impact $$ $

Semi-Settled Nomad

PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION

Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers

29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion

NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations

Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility

ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE

While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted

SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING

In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting

ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS

Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore

In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller

30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise

We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives

Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases

bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously

provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on

These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future

SITUATIONS PUTTING SLA FULFILLMENT AT RISK

Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy

The following list summarize situations that are critical for service delivery

bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time

bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use

bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react

bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this

looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a

ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk

Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are

contained

31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups

41 PHASES AND TRANSITIONS

This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness

bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase

bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase

bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase

42 MONITORING

NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors

Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set

THRESHOLDS

Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation

Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page

32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 17) Operations Manager screen to configure thresholds on operational metrics

For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation

Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data

The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations

bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge

bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification

bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor

bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification

Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state

bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached

bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification

bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality

33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

TRENDING

Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly

Figure 18) Trending of data growth and days-to-full prediction in Operations Manager

Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting

The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates

On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary

Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object

34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME

For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog

MONITORING STORAGE EFFICIENCY RETURNS

NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard

Figure 19) Storage efficiency dashboard in Operations Manager

35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

43 NOTIFICATION

Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low

After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process

Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure

Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP

NOTIFY BY E-MAIL

An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42

NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager

NOTIFY BY SNMP

Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested

36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 20) Configuring an alarm based on the threshold aggregate almost full

Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there

NOTIFY BY SCRIPT

Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line

dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full

37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

44 MITIGATE STORAGE USE

Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor

Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary

When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness

MITIGATION ACTIVITIES FOR AGGREGATES

Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow

1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives

2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero

3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space

4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage

controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability

6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes

7 If none of the listed activities can be used the application must be stopped to achieve a consistent state

The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online

38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Table 8) Mitigation alternatives to control use within aggregates

No Mitigation Activity Repeatability SLA Impact Preparation Time

Time to Show Effect

1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits

None HW procurement

Immediate (+rebalancing)

Data ONTAP 8 High limits

2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate

3 Shrink other volumes in the aggregate if they have enough free space

One time Low None Immediate

4 Run deduplication and shrink volumes Repeatable Low

Time to execute dedupe

Immediate

5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time

6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh

Next planned downtime window

Minutes Volume switch-over time

7 Prevent application data loss and stop the application then migrate (offline)

Repeatable Lowndashhigh Coordinate with app owner

Minutes Migration time

MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS

Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity

Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)

Table 9) Mitigation activities for resource tightness within volumes

No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect

1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate

2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate

3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate

4 Activate FAS deduplication for the volume (requires proper space guarantees) One time

Lowpossible performance impact

Wait for schedule Hours

5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate

Repeatable High Next planned downtime window

Minutes Volume migration time

6 Stop application and migrate data Repeatable High Coordinate wapp owner

Minutes Migration time

39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations

The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum

51 SAMPLE SETTING 1 REAL-LIFE SETTING

This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative

Figure 21) Storage to enable organic data growth between planned downtime windows

Data Data Growth

Planned Downtime Window

Planned Downtime Window

Months Time

Note Several months might fall between planned downtime windows to perform major mitigation alternatives

The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely

Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate

40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth

An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are

bull First metric Aggregate capacity used bull Second metric Aggregate space committed

Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space

Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space

Data Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used

Aggregate Space Committed

0ndash50 gt 65

0ndash110 gt 120

Provisioning New Storage Y

Capacity Assessment Adapt Thresholds

Mitigate

Y Y

Y

Provisioning New Storage Y

Assess Capacity Y

41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

52 SAMPLE SETTING 2 SETTLEDNOMAD

This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online

Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months

Settled

Detecting the Need to Act

Effect of Mitigation (eg migration)

Hours Time

N NN N N

In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on

the individual situation and is calculated against 100

The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases

Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative

Detection Threshold Notify Mitigation

gt 70 Storage operations Stop provisioning of storage

gt 85 Storage operations Stop extending provisioned storage

gt 90 Storage operations Relax resource situation and migrate nomad

42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used

Settled Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used 0ndash70 70ndash85 gt 90

Provisioning New Storage Y

Extending Already Provisioned Storage

Relax UtilizationmdashNetApp Data Motion a Nomad

Y Y

Y

N N N

You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors

43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe

Elapsed Time

Capacity

1 Month 3 Months

Committed Capacity

Overall Trend

Last 3-Month Trend

Capacity Used

1 2 3

As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows

1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window

2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment

3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)

44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not

exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect

b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives

c Determine the growth rate Operations Manager provides help in determining the trend of data growth

d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past

To provision storage following these steps

1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes

2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler

entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there

is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller

c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely

d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off

Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 18: Lun Provision

18 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Option Recommended Value Notes

autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on

try first volume_grow

Volume Snapshot Options

reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)

schedule switched off

autodelete off Deleting Snapshot copies might be an option when the volume can no longer be resized because the maximum configured size has been reached or when the aggregatersquos free space becomes low

LUN Options

reservation disable No preallocation of blocks for LUN

SUMMARY OF PROVISIONING METHODS

There are good reasons for using any of the provisioning methods already described however full fat for SAN environments should be avoided wherever possible because of the storage efficiency ratio Even with a 100 block usage ratio on primary data zero fat provisioning has many advantages and is the preferred method

bull The aggregatersquos free space is a global pool that can serve space for volumes This gives more flexibility than volumes with their own dedicated free space

bull For SAN volumes the block consumption can be easily monitored bull Deduplication savings go directly into the global pool of free space which is the aggregate or the

resource pool in which it belongs bull Monitoring is needed only on the aggregate level Volumes will grow on demand

Table 7) Comparison of provisioning methods

Characteristics Full Fat Low Fat Zero Fat

Space consumption 2X + Δ X + Δ X ndash N + Δ2

Space efficient

No Partially for Snapshot copies

Yes

Monitoring Optional Required on volume and aggregate level

Required on aggregate level

Notificationmitigation process required

No Optional in most cases Yes

2 N is the traditional thin provisioning impact = amount of blocks logically allocated but not used

19 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Characteristics Full Fat Low Fat Zero Fat

Pool benefitting of dedupe savings

Volume fractional reserve area

Volume free space area Aggregate free space area

Risk of an out of space condition on primary data

No No as long as autodelete is able to delete any Snapshot copies

Yes when monitoring and notification processes are missing

Typical use cases Small installations None or few storage management skills (no monitoring infrastructure)

Large database environments

Shared storage infrastructure Testdev environments Storage pools for virtualized servers

FULLLOWZERO FAT PROVISIONING WITH PROVISIONING MANAGER

NetApp Provisioning Manager focuses on fast and efficient provisioning of storage resources in the NetApp storage infrastructure It speeds up provisioning workflows and boosts capacity usage by using policy-based automation for the entire NetApp NAS and SAN infrastructure These processes are

bull Faster than manually provisioning storage bull Easier to maintain than scripts bull Instrumental in minimizing the risk of data loss resulting from misconfigured storage

Provisioning Manager applies user-defined policies to consistently select the appropriate resources for each provisioning activity This frees administrators from the headache of searching for available space to provision and allows more time for strategic issues The use of a centralized management console allows administrators to monitor the status of their provisioned storage resources Provisioning Manager can help improve business agility and capacity use shrink provisioning time and improve administrator productivity Provisioning Managerrsquos thin provisioning and deduplication capabilities provide a high level of storage efficiency from your NetApp storage investment

A GUI allows you to implement the provisioning models fulllowzero fat within Provisioning Manager See Figure 9 and Figure 10 for configuring storage efficiency in a provisioning policy for NAS and SAN Whenever storage is provisioned using this provisioning policy the settings apply automatically For more information refer to TR-3710 Operations Manager Provisioning Manager and Protection Manager Best Practices Guide

Provisioning Manager encapsulates technical details when provisioning storage and supports an easy integration with existing management tools and orchestration frameworks Policies and their use in so-called datasets and storage services allow you to exploit NetApp storage efficiency technologies without exposing a high level of technical detail to a higher level of management software

Note Provisioning Manager up to version 40 does not allow you to specify autosize and autodelete individually the policy template determines if these features are used and which options are selected In order to implement the provisioning methods outlined a customized ldquoProvisioning Scriptrdquo needs to be provided to set autosize and autodelete parameters according to the recommendations for FullLowZero methods Post provisioning scripts are standard with Provision Manager Use caution when Provisioning Manager runs conformance checks this reverts individual settings

20 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete

21 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat

FULLLOWZERO FAT PROVISIONING FOR STORAGE SERVICES

Storage services are an easy abstraction to provision storage in a utilitylike fashion A storage service describes all characteristic attributes for storage needed in a certain scenario A storage service catalog lists the available templates and allows you to provision storage with these attributes on demand Technically storage services or datasets consist of one or more resource pools a protection policy and a provisioning policy Fulllowzero fat provisioning for storage services is configured in the configuration wizard of the provisioning policy Figure 11 shows the provisioning policies closest to fulllowzero fat configurations

Figure 11) Fulllowzero fat provisioning policies for datasets and storage services

Because this wizard is able to configure the deduplication feature two policies are configured for the zero fat configurations one with deduplication and one without deduplication

22 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

HOW SHOULD A VOLUME BE SIZED

Because physical allocation of data within a zero fat-provisioned volume is done on demand theoretically the volume size can be set to a very high value that can easily keep all application data and Snapshot copies As the unallocated space in the volume is not exclusively reserved for this volume itself all other applications can benefit from the shared pool of unallocated storage However NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand The important advantage is that the commitment rate acts as a metric for data consolidation

Note The commitment rate reflects the amount of logical data consolidation This metric is suitable for deciding when data should be left for organic growth

Additionally the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers

APPLICATION RECOMMENDATIONS

Thin provisioning is most effective when applications use data that is committed to them step by step When applications preformat data the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks Because thin provisioning has no performance penalty the general recommendation is to provision with the zero fat configuration

For SAN-attached storage NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed On Windowsreg platforms this can be configured in NetApp SnapDrivereg

For Oraclereg database best practices refer to WP-7084 Storage Efficiency in an Oracle Environment

32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS

This section deals with provisioning storage for similar applications from a golden template A valid use case is a hosting provider who offers and serves predefined application services in mass quantities Instead of provisioning each application from scratch the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure

When applications are provisioned this way NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space This achieves a high degree of data consolidation and cost savings

The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance CPU and memory

There are two ways to align application data to a NetApp shared storage infrastructure

bull Volume-centric storage layout bull Dedupe-centric storage layout

Depending on the data lifetime suitability for deduplication consistency and tool constraints one way of aligning application data is more appropriate than the other

In both variants the storage of the application template can be provisioned as either full low or zero fat The cloning procedure inherits the attributes of the parent volume To create space-efficient clones the space guarantee must be set to none

23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

CONSEQUENCES FOR MONITORING

When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well

VOLUME-CENTRIC STORAGE LAYOUT

In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies

In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions

bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time

bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data

A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN

Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate

Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases

bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts

24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance 1

Deduplication Block Sharing FlexVol

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance n

FlexClone Block Sharing

Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow

Best Practice

A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance

Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided

bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning

bull Preformatting data

25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

DEDUPE-CENTRIC STORAGE LAYOUT

In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication

This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity

This storage layout provides the following storage efficiency advantages in a short- and long-term perspective

bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns

bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data

In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation

TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment

Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates

26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically

Template

Instance 1

Instance 2

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled

Best Practice

This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication

Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume

We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties

27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION

Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach

The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left

It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to

bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a

limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the

migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval

Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate

Settled

Aggregate

Nomad Nomad

To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor

28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

SLA-BASED ASSESSMENT FOR SETTLEDNOMAD

The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances

We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment

Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances

Figure 15) Alignment by technical impact (sorted by negative impact in descending order)

Settled Nomad

Eg All FC

Instance Inst1 InstN

Medium LowInside SLANeg Impact HighOutside SLA

Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest

Figure 16) Alignment by business impact (sorted by negative impact in descending order)

Settled Nomad

Instance

Neg Impact $$ $

Semi-Settled Nomad

PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION

Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers

29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion

NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations

Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility

ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE

While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted

SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING

In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting

ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS

Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore

In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller

30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise

We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives

Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases

bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously

provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on

These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future

SITUATIONS PUTTING SLA FULFILLMENT AT RISK

Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy

The following list summarize situations that are critical for service delivery

bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time

bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use

bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react

bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this

looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a

ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk

Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are

contained

31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups

41 PHASES AND TRANSITIONS

This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness

bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase

bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase

bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase

42 MONITORING

NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors

Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set

THRESHOLDS

Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation

Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page

32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 17) Operations Manager screen to configure thresholds on operational metrics

For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation

Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data

The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations

bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge

bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification

bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor

bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification

Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state

bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached

bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification

bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality

33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

TRENDING

Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly

Figure 18) Trending of data growth and days-to-full prediction in Operations Manager

Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting

The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates

On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary

Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object

34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME

For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog

MONITORING STORAGE EFFICIENCY RETURNS

NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard

Figure 19) Storage efficiency dashboard in Operations Manager

35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

43 NOTIFICATION

Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low

After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process

Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure

Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP

NOTIFY BY E-MAIL

An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42

NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager

NOTIFY BY SNMP

Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested

36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 20) Configuring an alarm based on the threshold aggregate almost full

Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there

NOTIFY BY SCRIPT

Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line

dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full

37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

44 MITIGATE STORAGE USE

Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor

Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary

When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness

MITIGATION ACTIVITIES FOR AGGREGATES

Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow

1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives

2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero

3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space

4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage

controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability

6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes

7 If none of the listed activities can be used the application must be stopped to achieve a consistent state

The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online

38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Table 8) Mitigation alternatives to control use within aggregates

No Mitigation Activity Repeatability SLA Impact Preparation Time

Time to Show Effect

1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits

None HW procurement

Immediate (+rebalancing)

Data ONTAP 8 High limits

2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate

3 Shrink other volumes in the aggregate if they have enough free space

One time Low None Immediate

4 Run deduplication and shrink volumes Repeatable Low

Time to execute dedupe

Immediate

5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time

6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh

Next planned downtime window

Minutes Volume switch-over time

7 Prevent application data loss and stop the application then migrate (offline)

Repeatable Lowndashhigh Coordinate with app owner

Minutes Migration time

MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS

Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity

Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)

Table 9) Mitigation activities for resource tightness within volumes

No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect

1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate

2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate

3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate

4 Activate FAS deduplication for the volume (requires proper space guarantees) One time

Lowpossible performance impact

Wait for schedule Hours

5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate

Repeatable High Next planned downtime window

Minutes Volume migration time

6 Stop application and migrate data Repeatable High Coordinate wapp owner

Minutes Migration time

39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations

The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum

51 SAMPLE SETTING 1 REAL-LIFE SETTING

This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative

Figure 21) Storage to enable organic data growth between planned downtime windows

Data Data Growth

Planned Downtime Window

Planned Downtime Window

Months Time

Note Several months might fall between planned downtime windows to perform major mitigation alternatives

The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely

Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate

40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth

An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are

bull First metric Aggregate capacity used bull Second metric Aggregate space committed

Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space

Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space

Data Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used

Aggregate Space Committed

0ndash50 gt 65

0ndash110 gt 120

Provisioning New Storage Y

Capacity Assessment Adapt Thresholds

Mitigate

Y Y

Y

Provisioning New Storage Y

Assess Capacity Y

41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

52 SAMPLE SETTING 2 SETTLEDNOMAD

This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online

Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months

Settled

Detecting the Need to Act

Effect of Mitigation (eg migration)

Hours Time

N NN N N

In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on

the individual situation and is calculated against 100

The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases

Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative

Detection Threshold Notify Mitigation

gt 70 Storage operations Stop provisioning of storage

gt 85 Storage operations Stop extending provisioned storage

gt 90 Storage operations Relax resource situation and migrate nomad

42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used

Settled Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used 0ndash70 70ndash85 gt 90

Provisioning New Storage Y

Extending Already Provisioned Storage

Relax UtilizationmdashNetApp Data Motion a Nomad

Y Y

Y

N N N

You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors

43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe

Elapsed Time

Capacity

1 Month 3 Months

Committed Capacity

Overall Trend

Last 3-Month Trend

Capacity Used

1 2 3

As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows

1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window

2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment

3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)

44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not

exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect

b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives

c Determine the growth rate Operations Manager provides help in determining the trend of data growth

d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past

To provision storage following these steps

1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes

2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler

entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there

is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller

c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely

d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off

Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 19: Lun Provision

19 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Characteristics Full Fat Low Fat Zero Fat

Pool benefitting of dedupe savings

Volume fractional reserve area

Volume free space area Aggregate free space area

Risk of an out of space condition on primary data

No No as long as autodelete is able to delete any Snapshot copies

Yes when monitoring and notification processes are missing

Typical use cases Small installations None or few storage management skills (no monitoring infrastructure)

Large database environments

Shared storage infrastructure Testdev environments Storage pools for virtualized servers

FULLLOWZERO FAT PROVISIONING WITH PROVISIONING MANAGER

NetApp Provisioning Manager focuses on fast and efficient provisioning of storage resources in the NetApp storage infrastructure It speeds up provisioning workflows and boosts capacity usage by using policy-based automation for the entire NetApp NAS and SAN infrastructure These processes are

bull Faster than manually provisioning storage bull Easier to maintain than scripts bull Instrumental in minimizing the risk of data loss resulting from misconfigured storage

Provisioning Manager applies user-defined policies to consistently select the appropriate resources for each provisioning activity This frees administrators from the headache of searching for available space to provision and allows more time for strategic issues The use of a centralized management console allows administrators to monitor the status of their provisioned storage resources Provisioning Manager can help improve business agility and capacity use shrink provisioning time and improve administrator productivity Provisioning Managerrsquos thin provisioning and deduplication capabilities provide a high level of storage efficiency from your NetApp storage investment

A GUI allows you to implement the provisioning models fulllowzero fat within Provisioning Manager See Figure 9 and Figure 10 for configuring storage efficiency in a provisioning policy for NAS and SAN Whenever storage is provisioned using this provisioning policy the settings apply automatically For more information refer to TR-3710 Operations Manager Provisioning Manager and Protection Manager Best Practices Guide

Provisioning Manager encapsulates technical details when provisioning storage and supports an easy integration with existing management tools and orchestration frameworks Policies and their use in so-called datasets and storage services allow you to exploit NetApp storage efficiency technologies without exposing a high level of technical detail to a higher level of management software

Note Provisioning Manager up to version 40 does not allow you to specify autosize and autodelete individually the policy template determines if these features are used and which options are selected In order to implement the provisioning methods outlined a customized ldquoProvisioning Scriptrdquo needs to be provided to set autosize and autodelete parameters according to the recommendations for FullLowZero methods Post provisioning scripts are standard with Provision Manager Use caution when Provisioning Manager runs conformance checks this reverts individual settings

20 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete

21 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat

FULLLOWZERO FAT PROVISIONING FOR STORAGE SERVICES

Storage services are an easy abstraction to provision storage in a utilitylike fashion A storage service describes all characteristic attributes for storage needed in a certain scenario A storage service catalog lists the available templates and allows you to provision storage with these attributes on demand Technically storage services or datasets consist of one or more resource pools a protection policy and a provisioning policy Fulllowzero fat provisioning for storage services is configured in the configuration wizard of the provisioning policy Figure 11 shows the provisioning policies closest to fulllowzero fat configurations

Figure 11) Fulllowzero fat provisioning policies for datasets and storage services

Because this wizard is able to configure the deduplication feature two policies are configured for the zero fat configurations one with deduplication and one without deduplication

22 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

HOW SHOULD A VOLUME BE SIZED

Because physical allocation of data within a zero fat-provisioned volume is done on demand theoretically the volume size can be set to a very high value that can easily keep all application data and Snapshot copies As the unallocated space in the volume is not exclusively reserved for this volume itself all other applications can benefit from the shared pool of unallocated storage However NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand The important advantage is that the commitment rate acts as a metric for data consolidation

Note The commitment rate reflects the amount of logical data consolidation This metric is suitable for deciding when data should be left for organic growth

Additionally the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers

APPLICATION RECOMMENDATIONS

Thin provisioning is most effective when applications use data that is committed to them step by step When applications preformat data the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks Because thin provisioning has no performance penalty the general recommendation is to provision with the zero fat configuration

For SAN-attached storage NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed On Windowsreg platforms this can be configured in NetApp SnapDrivereg

For Oraclereg database best practices refer to WP-7084 Storage Efficiency in an Oracle Environment

32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS

This section deals with provisioning storage for similar applications from a golden template A valid use case is a hosting provider who offers and serves predefined application services in mass quantities Instead of provisioning each application from scratch the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure

When applications are provisioned this way NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space This achieves a high degree of data consolidation and cost savings

The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance CPU and memory

There are two ways to align application data to a NetApp shared storage infrastructure

bull Volume-centric storage layout bull Dedupe-centric storage layout

Depending on the data lifetime suitability for deduplication consistency and tool constraints one way of aligning application data is more appropriate than the other

In both variants the storage of the application template can be provisioned as either full low or zero fat The cloning procedure inherits the attributes of the parent volume To create space-efficient clones the space guarantee must be set to none

23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

CONSEQUENCES FOR MONITORING

When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well

VOLUME-CENTRIC STORAGE LAYOUT

In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies

In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions

bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time

bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data

A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN

Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate

Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases

bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts

24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance 1

Deduplication Block Sharing FlexVol

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance n

FlexClone Block Sharing

Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow

Best Practice

A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance

Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided

bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning

bull Preformatting data

25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

DEDUPE-CENTRIC STORAGE LAYOUT

In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication

This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity

This storage layout provides the following storage efficiency advantages in a short- and long-term perspective

bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns

bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data

In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation

TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment

Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates

26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically

Template

Instance 1

Instance 2

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled

Best Practice

This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication

Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume

We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties

27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION

Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach

The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left

It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to

bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a

limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the

migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval

Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate

Settled

Aggregate

Nomad Nomad

To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor

28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

SLA-BASED ASSESSMENT FOR SETTLEDNOMAD

The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances

We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment

Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances

Figure 15) Alignment by technical impact (sorted by negative impact in descending order)

Settled Nomad

Eg All FC

Instance Inst1 InstN

Medium LowInside SLANeg Impact HighOutside SLA

Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest

Figure 16) Alignment by business impact (sorted by negative impact in descending order)

Settled Nomad

Instance

Neg Impact $$ $

Semi-Settled Nomad

PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION

Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers

29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion

NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations

Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility

ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE

While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted

SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING

In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting

ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS

Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore

In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller

30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise

We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives

Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases

bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously

provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on

These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future

SITUATIONS PUTTING SLA FULFILLMENT AT RISK

Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy

The following list summarize situations that are critical for service delivery

bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time

bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use

bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react

bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this

looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a

ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk

Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are

contained

31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups

41 PHASES AND TRANSITIONS

This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness

bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase

bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase

bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase

42 MONITORING

NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors

Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set

THRESHOLDS

Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation

Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page

32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 17) Operations Manager screen to configure thresholds on operational metrics

For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation

Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data

The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations

bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge

bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification

bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor

bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification

Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state

bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached

bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification

bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality

33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

TRENDING

Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly

Figure 18) Trending of data growth and days-to-full prediction in Operations Manager

Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting

The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates

On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary

Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object

34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME

For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog

MONITORING STORAGE EFFICIENCY RETURNS

NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard

Figure 19) Storage efficiency dashboard in Operations Manager

35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

43 NOTIFICATION

Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low

After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process

Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure

Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP

NOTIFY BY E-MAIL

An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42

NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager

NOTIFY BY SNMP

Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested

36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 20) Configuring an alarm based on the threshold aggregate almost full

Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there

NOTIFY BY SCRIPT

Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line

dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full

37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

44 MITIGATE STORAGE USE

Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor

Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary

When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness

MITIGATION ACTIVITIES FOR AGGREGATES

Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow

1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives

2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero

3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space

4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage

controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability

6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes

7 If none of the listed activities can be used the application must be stopped to achieve a consistent state

The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online

38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Table 8) Mitigation alternatives to control use within aggregates

No Mitigation Activity Repeatability SLA Impact Preparation Time

Time to Show Effect

1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits

None HW procurement

Immediate (+rebalancing)

Data ONTAP 8 High limits

2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate

3 Shrink other volumes in the aggregate if they have enough free space

One time Low None Immediate

4 Run deduplication and shrink volumes Repeatable Low

Time to execute dedupe

Immediate

5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time

6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh

Next planned downtime window

Minutes Volume switch-over time

7 Prevent application data loss and stop the application then migrate (offline)

Repeatable Lowndashhigh Coordinate with app owner

Minutes Migration time

MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS

Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity

Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)

Table 9) Mitigation activities for resource tightness within volumes

No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect

1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate

2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate

3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate

4 Activate FAS deduplication for the volume (requires proper space guarantees) One time

Lowpossible performance impact

Wait for schedule Hours

5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate

Repeatable High Next planned downtime window

Minutes Volume migration time

6 Stop application and migrate data Repeatable High Coordinate wapp owner

Minutes Migration time

39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations

The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum

51 SAMPLE SETTING 1 REAL-LIFE SETTING

This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative

Figure 21) Storage to enable organic data growth between planned downtime windows

Data Data Growth

Planned Downtime Window

Planned Downtime Window

Months Time

Note Several months might fall between planned downtime windows to perform major mitigation alternatives

The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely

Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate

40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth

An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are

bull First metric Aggregate capacity used bull Second metric Aggregate space committed

Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space

Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space

Data Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used

Aggregate Space Committed

0ndash50 gt 65

0ndash110 gt 120

Provisioning New Storage Y

Capacity Assessment Adapt Thresholds

Mitigate

Y Y

Y

Provisioning New Storage Y

Assess Capacity Y

41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

52 SAMPLE SETTING 2 SETTLEDNOMAD

This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online

Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months

Settled

Detecting the Need to Act

Effect of Mitigation (eg migration)

Hours Time

N NN N N

In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on

the individual situation and is calculated against 100

The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases

Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative

Detection Threshold Notify Mitigation

gt 70 Storage operations Stop provisioning of storage

gt 85 Storage operations Stop extending provisioned storage

gt 90 Storage operations Relax resource situation and migrate nomad

42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used

Settled Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used 0ndash70 70ndash85 gt 90

Provisioning New Storage Y

Extending Already Provisioned Storage

Relax UtilizationmdashNetApp Data Motion a Nomad

Y Y

Y

N N N

You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors

43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe

Elapsed Time

Capacity

1 Month 3 Months

Committed Capacity

Overall Trend

Last 3-Month Trend

Capacity Used

1 2 3

As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows

1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window

2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment

3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)

44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not

exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect

b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives

c Determine the growth rate Operations Manager provides help in determining the trend of data growth

d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past

To provision storage following these steps

1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes

2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler

entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there

is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller

c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely

d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off

Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 20: Lun Provision

20 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete

21 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat

FULLLOWZERO FAT PROVISIONING FOR STORAGE SERVICES

Storage services are an easy abstraction to provision storage in a utilitylike fashion A storage service describes all characteristic attributes for storage needed in a certain scenario A storage service catalog lists the available templates and allows you to provision storage with these attributes on demand Technically storage services or datasets consist of one or more resource pools a protection policy and a provisioning policy Fulllowzero fat provisioning for storage services is configured in the configuration wizard of the provisioning policy Figure 11 shows the provisioning policies closest to fulllowzero fat configurations

Figure 11) Fulllowzero fat provisioning policies for datasets and storage services

Because this wizard is able to configure the deduplication feature two policies are configured for the zero fat configurations one with deduplication and one without deduplication

22 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

HOW SHOULD A VOLUME BE SIZED

Because physical allocation of data within a zero fat-provisioned volume is done on demand theoretically the volume size can be set to a very high value that can easily keep all application data and Snapshot copies As the unallocated space in the volume is not exclusively reserved for this volume itself all other applications can benefit from the shared pool of unallocated storage However NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand The important advantage is that the commitment rate acts as a metric for data consolidation

Note The commitment rate reflects the amount of logical data consolidation This metric is suitable for deciding when data should be left for organic growth

Additionally the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers

APPLICATION RECOMMENDATIONS

Thin provisioning is most effective when applications use data that is committed to them step by step When applications preformat data the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks Because thin provisioning has no performance penalty the general recommendation is to provision with the zero fat configuration

For SAN-attached storage NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed On Windowsreg platforms this can be configured in NetApp SnapDrivereg

For Oraclereg database best practices refer to WP-7084 Storage Efficiency in an Oracle Environment

32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS

This section deals with provisioning storage for similar applications from a golden template A valid use case is a hosting provider who offers and serves predefined application services in mass quantities Instead of provisioning each application from scratch the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure

When applications are provisioned this way NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space This achieves a high degree of data consolidation and cost savings

The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance CPU and memory

There are two ways to align application data to a NetApp shared storage infrastructure

bull Volume-centric storage layout bull Dedupe-centric storage layout

Depending on the data lifetime suitability for deduplication consistency and tool constraints one way of aligning application data is more appropriate than the other

In both variants the storage of the application template can be provisioned as either full low or zero fat The cloning procedure inherits the attributes of the parent volume To create space-efficient clones the space guarantee must be set to none

23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

CONSEQUENCES FOR MONITORING

When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well

VOLUME-CENTRIC STORAGE LAYOUT

In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies

In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions

bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time

bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data

A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN

Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate

Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases

bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts

24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance 1

Deduplication Block Sharing FlexVol

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance n

FlexClone Block Sharing

Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow

Best Practice

A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance

Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided

bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning

bull Preformatting data

25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

DEDUPE-CENTRIC STORAGE LAYOUT

In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication

This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity

This storage layout provides the following storage efficiency advantages in a short- and long-term perspective

bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns

bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data

In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation

TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment

Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates

26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically

Template

Instance 1

Instance 2

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled

Best Practice

This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication

Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume

We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties

27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION

Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach

The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left

It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to

bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a

limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the

migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval

Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate

Settled

Aggregate

Nomad Nomad

To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor

28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

SLA-BASED ASSESSMENT FOR SETTLEDNOMAD

The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances

We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment

Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances

Figure 15) Alignment by technical impact (sorted by negative impact in descending order)

Settled Nomad

Eg All FC

Instance Inst1 InstN

Medium LowInside SLANeg Impact HighOutside SLA

Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest

Figure 16) Alignment by business impact (sorted by negative impact in descending order)

Settled Nomad

Instance

Neg Impact $$ $

Semi-Settled Nomad

PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION

Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers

29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion

NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations

Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility

ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE

While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted

SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING

In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting

ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS

Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore

In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller

30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise

We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives

Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases

bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously

provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on

These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future

SITUATIONS PUTTING SLA FULFILLMENT AT RISK

Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy

The following list summarize situations that are critical for service delivery

bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time

bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use

bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react

bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this

looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a

ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk

Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are

contained

31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups

41 PHASES AND TRANSITIONS

This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness

bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase

bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase

bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase

42 MONITORING

NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors

Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set

THRESHOLDS

Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation

Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page

32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 17) Operations Manager screen to configure thresholds on operational metrics

For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation

Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data

The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations

bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge

bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification

bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor

bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification

Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state

bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached

bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification

bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality

33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

TRENDING

Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly

Figure 18) Trending of data growth and days-to-full prediction in Operations Manager

Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting

The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates

On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary

Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object

34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME

For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog

MONITORING STORAGE EFFICIENCY RETURNS

NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard

Figure 19) Storage efficiency dashboard in Operations Manager

35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

43 NOTIFICATION

Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low

After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process

Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure

Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP

NOTIFY BY E-MAIL

An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42

NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager

NOTIFY BY SNMP

Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested

36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 20) Configuring an alarm based on the threshold aggregate almost full

Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there

NOTIFY BY SCRIPT

Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line

dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full

37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

44 MITIGATE STORAGE USE

Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor

Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary

When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness

MITIGATION ACTIVITIES FOR AGGREGATES

Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow

1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives

2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero

3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space

4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage

controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability

6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes

7 If none of the listed activities can be used the application must be stopped to achieve a consistent state

The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online

38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Table 8) Mitigation alternatives to control use within aggregates

No Mitigation Activity Repeatability SLA Impact Preparation Time

Time to Show Effect

1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits

None HW procurement

Immediate (+rebalancing)

Data ONTAP 8 High limits

2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate

3 Shrink other volumes in the aggregate if they have enough free space

One time Low None Immediate

4 Run deduplication and shrink volumes Repeatable Low

Time to execute dedupe

Immediate

5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time

6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh

Next planned downtime window

Minutes Volume switch-over time

7 Prevent application data loss and stop the application then migrate (offline)

Repeatable Lowndashhigh Coordinate with app owner

Minutes Migration time

MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS

Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity

Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)

Table 9) Mitigation activities for resource tightness within volumes

No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect

1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate

2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate

3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate

4 Activate FAS deduplication for the volume (requires proper space guarantees) One time

Lowpossible performance impact

Wait for schedule Hours

5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate

Repeatable High Next planned downtime window

Minutes Volume migration time

6 Stop application and migrate data Repeatable High Coordinate wapp owner

Minutes Migration time

39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations

The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum

51 SAMPLE SETTING 1 REAL-LIFE SETTING

This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative

Figure 21) Storage to enable organic data growth between planned downtime windows

Data Data Growth

Planned Downtime Window

Planned Downtime Window

Months Time

Note Several months might fall between planned downtime windows to perform major mitigation alternatives

The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely

Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate

40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth

An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are

bull First metric Aggregate capacity used bull Second metric Aggregate space committed

Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space

Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space

Data Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used

Aggregate Space Committed

0ndash50 gt 65

0ndash110 gt 120

Provisioning New Storage Y

Capacity Assessment Adapt Thresholds

Mitigate

Y Y

Y

Provisioning New Storage Y

Assess Capacity Y

41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

52 SAMPLE SETTING 2 SETTLEDNOMAD

This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online

Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months

Settled

Detecting the Need to Act

Effect of Mitigation (eg migration)

Hours Time

N NN N N

In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on

the individual situation and is calculated against 100

The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases

Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative

Detection Threshold Notify Mitigation

gt 70 Storage operations Stop provisioning of storage

gt 85 Storage operations Stop extending provisioned storage

gt 90 Storage operations Relax resource situation and migrate nomad

42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used

Settled Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used 0ndash70 70ndash85 gt 90

Provisioning New Storage Y

Extending Already Provisioned Storage

Relax UtilizationmdashNetApp Data Motion a Nomad

Y Y

Y

N N N

You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors

43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe

Elapsed Time

Capacity

1 Month 3 Months

Committed Capacity

Overall Trend

Last 3-Month Trend

Capacity Used

1 2 3

As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows

1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window

2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment

3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)

44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not

exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect

b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives

c Determine the growth rate Operations Manager provides help in determining the trend of data growth

d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past

To provision storage following these steps

1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes

2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler

entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there

is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller

c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely

d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off

Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 21: Lun Provision

21 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat

FULLLOWZERO FAT PROVISIONING FOR STORAGE SERVICES

Storage services are an easy abstraction to provision storage in a utilitylike fashion A storage service describes all characteristic attributes for storage needed in a certain scenario A storage service catalog lists the available templates and allows you to provision storage with these attributes on demand Technically storage services or datasets consist of one or more resource pools a protection policy and a provisioning policy Fulllowzero fat provisioning for storage services is configured in the configuration wizard of the provisioning policy Figure 11 shows the provisioning policies closest to fulllowzero fat configurations

Figure 11) Fulllowzero fat provisioning policies for datasets and storage services

Because this wizard is able to configure the deduplication feature two policies are configured for the zero fat configurations one with deduplication and one without deduplication

22 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

HOW SHOULD A VOLUME BE SIZED

Because physical allocation of data within a zero fat-provisioned volume is done on demand theoretically the volume size can be set to a very high value that can easily keep all application data and Snapshot copies As the unallocated space in the volume is not exclusively reserved for this volume itself all other applications can benefit from the shared pool of unallocated storage However NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand The important advantage is that the commitment rate acts as a metric for data consolidation

Note The commitment rate reflects the amount of logical data consolidation This metric is suitable for deciding when data should be left for organic growth

Additionally the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers

APPLICATION RECOMMENDATIONS

Thin provisioning is most effective when applications use data that is committed to them step by step When applications preformat data the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks Because thin provisioning has no performance penalty the general recommendation is to provision with the zero fat configuration

For SAN-attached storage NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed On Windowsreg platforms this can be configured in NetApp SnapDrivereg

For Oraclereg database best practices refer to WP-7084 Storage Efficiency in an Oracle Environment

32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS

This section deals with provisioning storage for similar applications from a golden template A valid use case is a hosting provider who offers and serves predefined application services in mass quantities Instead of provisioning each application from scratch the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure

When applications are provisioned this way NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space This achieves a high degree of data consolidation and cost savings

The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance CPU and memory

There are two ways to align application data to a NetApp shared storage infrastructure

bull Volume-centric storage layout bull Dedupe-centric storage layout

Depending on the data lifetime suitability for deduplication consistency and tool constraints one way of aligning application data is more appropriate than the other

In both variants the storage of the application template can be provisioned as either full low or zero fat The cloning procedure inherits the attributes of the parent volume To create space-efficient clones the space guarantee must be set to none

23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

CONSEQUENCES FOR MONITORING

When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well

VOLUME-CENTRIC STORAGE LAYOUT

In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies

In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions

bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time

bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data

A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN

Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate

Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases

bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts

24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance 1

Deduplication Block Sharing FlexVol

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance n

FlexClone Block Sharing

Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow

Best Practice

A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance

Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided

bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning

bull Preformatting data

25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

DEDUPE-CENTRIC STORAGE LAYOUT

In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication

This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity

This storage layout provides the following storage efficiency advantages in a short- and long-term perspective

bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns

bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data

In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation

TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment

Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates

26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically

Template

Instance 1

Instance 2

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled

Best Practice

This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication

Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume

We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties

27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION

Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach

The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left

It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to

bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a

limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the

migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval

Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate

Settled

Aggregate

Nomad Nomad

To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor

28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

SLA-BASED ASSESSMENT FOR SETTLEDNOMAD

The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances

We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment

Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances

Figure 15) Alignment by technical impact (sorted by negative impact in descending order)

Settled Nomad

Eg All FC

Instance Inst1 InstN

Medium LowInside SLANeg Impact HighOutside SLA

Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest

Figure 16) Alignment by business impact (sorted by negative impact in descending order)

Settled Nomad

Instance

Neg Impact $$ $

Semi-Settled Nomad

PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION

Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers

29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion

NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations

Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility

ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE

While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted

SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING

In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting

ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS

Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore

In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller

30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise

We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives

Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases

bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously

provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on

These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future

SITUATIONS PUTTING SLA FULFILLMENT AT RISK

Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy

The following list summarize situations that are critical for service delivery

bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time

bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use

bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react

bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this

looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a

ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk

Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are

contained

31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups

41 PHASES AND TRANSITIONS

This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness

bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase

bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase

bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase

42 MONITORING

NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors

Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set

THRESHOLDS

Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation

Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page

32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 17) Operations Manager screen to configure thresholds on operational metrics

For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation

Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data

The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations

bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge

bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification

bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor

bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification

Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state

bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached

bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification

bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality

33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

TRENDING

Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly

Figure 18) Trending of data growth and days-to-full prediction in Operations Manager

Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting

The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates

On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary

Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object

34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME

For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog

MONITORING STORAGE EFFICIENCY RETURNS

NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard

Figure 19) Storage efficiency dashboard in Operations Manager

35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

43 NOTIFICATION

Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low

After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process

Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure

Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP

NOTIFY BY E-MAIL

An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42

NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager

NOTIFY BY SNMP

Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested

36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 20) Configuring an alarm based on the threshold aggregate almost full

Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there

NOTIFY BY SCRIPT

Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line

dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full

37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

44 MITIGATE STORAGE USE

Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor

Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary

When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness

MITIGATION ACTIVITIES FOR AGGREGATES

Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow

1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives

2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero

3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space

4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage

controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability

6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes

7 If none of the listed activities can be used the application must be stopped to achieve a consistent state

The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online

38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Table 8) Mitigation alternatives to control use within aggregates

No Mitigation Activity Repeatability SLA Impact Preparation Time

Time to Show Effect

1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits

None HW procurement

Immediate (+rebalancing)

Data ONTAP 8 High limits

2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate

3 Shrink other volumes in the aggregate if they have enough free space

One time Low None Immediate

4 Run deduplication and shrink volumes Repeatable Low

Time to execute dedupe

Immediate

5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time

6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh

Next planned downtime window

Minutes Volume switch-over time

7 Prevent application data loss and stop the application then migrate (offline)

Repeatable Lowndashhigh Coordinate with app owner

Minutes Migration time

MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS

Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity

Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)

Table 9) Mitigation activities for resource tightness within volumes

No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect

1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate

2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate

3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate

4 Activate FAS deduplication for the volume (requires proper space guarantees) One time

Lowpossible performance impact

Wait for schedule Hours

5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate

Repeatable High Next planned downtime window

Minutes Volume migration time

6 Stop application and migrate data Repeatable High Coordinate wapp owner

Minutes Migration time

39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations

The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum

51 SAMPLE SETTING 1 REAL-LIFE SETTING

This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative

Figure 21) Storage to enable organic data growth between planned downtime windows

Data Data Growth

Planned Downtime Window

Planned Downtime Window

Months Time

Note Several months might fall between planned downtime windows to perform major mitigation alternatives

The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely

Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate

40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth

An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are

bull First metric Aggregate capacity used bull Second metric Aggregate space committed

Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space

Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space

Data Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used

Aggregate Space Committed

0ndash50 gt 65

0ndash110 gt 120

Provisioning New Storage Y

Capacity Assessment Adapt Thresholds

Mitigate

Y Y

Y

Provisioning New Storage Y

Assess Capacity Y

41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

52 SAMPLE SETTING 2 SETTLEDNOMAD

This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online

Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months

Settled

Detecting the Need to Act

Effect of Mitigation (eg migration)

Hours Time

N NN N N

In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on

the individual situation and is calculated against 100

The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases

Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative

Detection Threshold Notify Mitigation

gt 70 Storage operations Stop provisioning of storage

gt 85 Storage operations Stop extending provisioned storage

gt 90 Storage operations Relax resource situation and migrate nomad

42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used

Settled Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used 0ndash70 70ndash85 gt 90

Provisioning New Storage Y

Extending Already Provisioned Storage

Relax UtilizationmdashNetApp Data Motion a Nomad

Y Y

Y

N N N

You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors

43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe

Elapsed Time

Capacity

1 Month 3 Months

Committed Capacity

Overall Trend

Last 3-Month Trend

Capacity Used

1 2 3

As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows

1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window

2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment

3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)

44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not

exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect

b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives

c Determine the growth rate Operations Manager provides help in determining the trend of data growth

d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past

To provision storage following these steps

1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes

2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler

entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there

is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller

c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely

d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off

Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 22: Lun Provision

22 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

HOW SHOULD A VOLUME BE SIZED

Because physical allocation of data within a zero fat-provisioned volume is done on demand theoretically the volume size can be set to a very high value that can easily keep all application data and Snapshot copies As the unallocated space in the volume is not exclusively reserved for this volume itself all other applications can benefit from the shared pool of unallocated storage However NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand The important advantage is that the commitment rate acts as a metric for data consolidation

Note The commitment rate reflects the amount of logical data consolidation This metric is suitable for deciding when data should be left for organic growth

Additionally the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers

APPLICATION RECOMMENDATIONS

Thin provisioning is most effective when applications use data that is committed to them step by step When applications preformat data the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks Because thin provisioning has no performance penalty the general recommendation is to provision with the zero fat configuration

For SAN-attached storage NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed On Windowsreg platforms this can be configured in NetApp SnapDrivereg

For Oraclereg database best practices refer to WP-7084 Storage Efficiency in an Oracle Environment

32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS

This section deals with provisioning storage for similar applications from a golden template A valid use case is a hosting provider who offers and serves predefined application services in mass quantities Instead of provisioning each application from scratch the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure

When applications are provisioned this way NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space This achieves a high degree of data consolidation and cost savings

The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance CPU and memory

There are two ways to align application data to a NetApp shared storage infrastructure

bull Volume-centric storage layout bull Dedupe-centric storage layout

Depending on the data lifetime suitability for deduplication consistency and tool constraints one way of aligning application data is more appropriate than the other

In both variants the storage of the application template can be provisioned as either full low or zero fat The cloning procedure inherits the attributes of the parent volume To create space-efficient clones the space guarantee must be set to none

23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

CONSEQUENCES FOR MONITORING

When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well

VOLUME-CENTRIC STORAGE LAYOUT

In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies

In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions

bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time

bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data

A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN

Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate

Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases

bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts

24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance 1

Deduplication Block Sharing FlexVol

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance n

FlexClone Block Sharing

Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow

Best Practice

A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance

Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided

bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning

bull Preformatting data

25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

DEDUPE-CENTRIC STORAGE LAYOUT

In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication

This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity

This storage layout provides the following storage efficiency advantages in a short- and long-term perspective

bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns

bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data

In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation

TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment

Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates

26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically

Template

Instance 1

Instance 2

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled

Best Practice

This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication

Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume

We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties

27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION

Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach

The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left

It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to

bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a

limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the

migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval

Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate

Settled

Aggregate

Nomad Nomad

To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor

28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

SLA-BASED ASSESSMENT FOR SETTLEDNOMAD

The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances

We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment

Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances

Figure 15) Alignment by technical impact (sorted by negative impact in descending order)

Settled Nomad

Eg All FC

Instance Inst1 InstN

Medium LowInside SLANeg Impact HighOutside SLA

Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest

Figure 16) Alignment by business impact (sorted by negative impact in descending order)

Settled Nomad

Instance

Neg Impact $$ $

Semi-Settled Nomad

PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION

Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers

29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion

NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations

Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility

ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE

While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted

SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING

In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting

ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS

Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore

In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller

30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise

We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives

Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases

bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously

provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on

These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future

SITUATIONS PUTTING SLA FULFILLMENT AT RISK

Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy

The following list summarize situations that are critical for service delivery

bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time

bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use

bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react

bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this

looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a

ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk

Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are

contained

31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups

41 PHASES AND TRANSITIONS

This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness

bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase

bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase

bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase

42 MONITORING

NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors

Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set

THRESHOLDS

Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation

Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page

32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 17) Operations Manager screen to configure thresholds on operational metrics

For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation

Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data

The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations

bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge

bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification

bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor

bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification

Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state

bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached

bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification

bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality

33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

TRENDING

Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly

Figure 18) Trending of data growth and days-to-full prediction in Operations Manager

Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting

The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates

On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary

Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object

34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME

For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog

MONITORING STORAGE EFFICIENCY RETURNS

NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard

Figure 19) Storage efficiency dashboard in Operations Manager

35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

43 NOTIFICATION

Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low

After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process

Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure

Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP

NOTIFY BY E-MAIL

An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42

NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager

NOTIFY BY SNMP

Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested

36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 20) Configuring an alarm based on the threshold aggregate almost full

Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there

NOTIFY BY SCRIPT

Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line

dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full

37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

44 MITIGATE STORAGE USE

Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor

Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary

When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness

MITIGATION ACTIVITIES FOR AGGREGATES

Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow

1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives

2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero

3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space

4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage

controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability

6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes

7 If none of the listed activities can be used the application must be stopped to achieve a consistent state

The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online

38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Table 8) Mitigation alternatives to control use within aggregates

No Mitigation Activity Repeatability SLA Impact Preparation Time

Time to Show Effect

1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits

None HW procurement

Immediate (+rebalancing)

Data ONTAP 8 High limits

2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate

3 Shrink other volumes in the aggregate if they have enough free space

One time Low None Immediate

4 Run deduplication and shrink volumes Repeatable Low

Time to execute dedupe

Immediate

5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time

6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh

Next planned downtime window

Minutes Volume switch-over time

7 Prevent application data loss and stop the application then migrate (offline)

Repeatable Lowndashhigh Coordinate with app owner

Minutes Migration time

MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS

Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity

Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)

Table 9) Mitigation activities for resource tightness within volumes

No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect

1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate

2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate

3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate

4 Activate FAS deduplication for the volume (requires proper space guarantees) One time

Lowpossible performance impact

Wait for schedule Hours

5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate

Repeatable High Next planned downtime window

Minutes Volume migration time

6 Stop application and migrate data Repeatable High Coordinate wapp owner

Minutes Migration time

39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations

The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum

51 SAMPLE SETTING 1 REAL-LIFE SETTING

This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative

Figure 21) Storage to enable organic data growth between planned downtime windows

Data Data Growth

Planned Downtime Window

Planned Downtime Window

Months Time

Note Several months might fall between planned downtime windows to perform major mitigation alternatives

The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely

Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate

40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth

An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are

bull First metric Aggregate capacity used bull Second metric Aggregate space committed

Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space

Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space

Data Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used

Aggregate Space Committed

0ndash50 gt 65

0ndash110 gt 120

Provisioning New Storage Y

Capacity Assessment Adapt Thresholds

Mitigate

Y Y

Y

Provisioning New Storage Y

Assess Capacity Y

41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

52 SAMPLE SETTING 2 SETTLEDNOMAD

This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online

Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months

Settled

Detecting the Need to Act

Effect of Mitigation (eg migration)

Hours Time

N NN N N

In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on

the individual situation and is calculated against 100

The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases

Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative

Detection Threshold Notify Mitigation

gt 70 Storage operations Stop provisioning of storage

gt 85 Storage operations Stop extending provisioned storage

gt 90 Storage operations Relax resource situation and migrate nomad

42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used

Settled Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used 0ndash70 70ndash85 gt 90

Provisioning New Storage Y

Extending Already Provisioned Storage

Relax UtilizationmdashNetApp Data Motion a Nomad

Y Y

Y

N N N

You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors

43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe

Elapsed Time

Capacity

1 Month 3 Months

Committed Capacity

Overall Trend

Last 3-Month Trend

Capacity Used

1 2 3

As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows

1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window

2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment

3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)

44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not

exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect

b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives

c Determine the growth rate Operations Manager provides help in determining the trend of data growth

d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past

To provision storage following these steps

1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes

2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler

entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there

is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller

c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely

d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off

Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 23: Lun Provision

23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

CONSEQUENCES FOR MONITORING

When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well

VOLUME-CENTRIC STORAGE LAYOUT

In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies

In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions

bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time

bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data

A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN

Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate

Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases

bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts

24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance 1

Deduplication Block Sharing FlexVol

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance n

FlexClone Block Sharing

Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow

Best Practice

A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance

Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided

bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning

bull Preformatting data

25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

DEDUPE-CENTRIC STORAGE LAYOUT

In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication

This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity

This storage layout provides the following storage efficiency advantages in a short- and long-term perspective

bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns

bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data

In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation

TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment

Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates

26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically

Template

Instance 1

Instance 2

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled

Best Practice

This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication

Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume

We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties

27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION

Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach

The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left

It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to

bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a

limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the

migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval

Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate

Settled

Aggregate

Nomad Nomad

To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor

28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

SLA-BASED ASSESSMENT FOR SETTLEDNOMAD

The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances

We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment

Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances

Figure 15) Alignment by technical impact (sorted by negative impact in descending order)

Settled Nomad

Eg All FC

Instance Inst1 InstN

Medium LowInside SLANeg Impact HighOutside SLA

Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest

Figure 16) Alignment by business impact (sorted by negative impact in descending order)

Settled Nomad

Instance

Neg Impact $$ $

Semi-Settled Nomad

PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION

Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers

29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion

NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations

Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility

ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE

While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted

SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING

In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting

ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS

Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore

In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller

30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise

We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives

Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases

bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously

provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on

These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future

SITUATIONS PUTTING SLA FULFILLMENT AT RISK

Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy

The following list summarize situations that are critical for service delivery

bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time

bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use

bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react

bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this

looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a

ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk

Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are

contained

31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups

41 PHASES AND TRANSITIONS

This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness

bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase

bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase

bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase

42 MONITORING

NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors

Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set

THRESHOLDS

Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation

Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page

32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 17) Operations Manager screen to configure thresholds on operational metrics

For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation

Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data

The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations

bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge

bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification

bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor

bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification

Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state

bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached

bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification

bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality

33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

TRENDING

Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly

Figure 18) Trending of data growth and days-to-full prediction in Operations Manager

Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting

The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates

On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary

Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object

34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME

For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog

MONITORING STORAGE EFFICIENCY RETURNS

NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard

Figure 19) Storage efficiency dashboard in Operations Manager

35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

43 NOTIFICATION

Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low

After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process

Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure

Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP

NOTIFY BY E-MAIL

An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42

NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager

NOTIFY BY SNMP

Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested

36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 20) Configuring an alarm based on the threshold aggregate almost full

Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there

NOTIFY BY SCRIPT

Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line

dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full

37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

44 MITIGATE STORAGE USE

Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor

Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary

When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness

MITIGATION ACTIVITIES FOR AGGREGATES

Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow

1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives

2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero

3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space

4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage

controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability

6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes

7 If none of the listed activities can be used the application must be stopped to achieve a consistent state

The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online

38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Table 8) Mitigation alternatives to control use within aggregates

No Mitigation Activity Repeatability SLA Impact Preparation Time

Time to Show Effect

1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits

None HW procurement

Immediate (+rebalancing)

Data ONTAP 8 High limits

2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate

3 Shrink other volumes in the aggregate if they have enough free space

One time Low None Immediate

4 Run deduplication and shrink volumes Repeatable Low

Time to execute dedupe

Immediate

5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time

6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh

Next planned downtime window

Minutes Volume switch-over time

7 Prevent application data loss and stop the application then migrate (offline)

Repeatable Lowndashhigh Coordinate with app owner

Minutes Migration time

MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS

Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity

Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)

Table 9) Mitigation activities for resource tightness within volumes

No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect

1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate

2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate

3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate

4 Activate FAS deduplication for the volume (requires proper space guarantees) One time

Lowpossible performance impact

Wait for schedule Hours

5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate

Repeatable High Next planned downtime window

Minutes Volume migration time

6 Stop application and migrate data Repeatable High Coordinate wapp owner

Minutes Migration time

39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations

The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum

51 SAMPLE SETTING 1 REAL-LIFE SETTING

This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative

Figure 21) Storage to enable organic data growth between planned downtime windows

Data Data Growth

Planned Downtime Window

Planned Downtime Window

Months Time

Note Several months might fall between planned downtime windows to perform major mitigation alternatives

The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely

Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate

40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth

An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are

bull First metric Aggregate capacity used bull Second metric Aggregate space committed

Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space

Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space

Data Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used

Aggregate Space Committed

0ndash50 gt 65

0ndash110 gt 120

Provisioning New Storage Y

Capacity Assessment Adapt Thresholds

Mitigate

Y Y

Y

Provisioning New Storage Y

Assess Capacity Y

41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

52 SAMPLE SETTING 2 SETTLEDNOMAD

This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online

Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months

Settled

Detecting the Need to Act

Effect of Mitigation (eg migration)

Hours Time

N NN N N

In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on

the individual situation and is calculated against 100

The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases

Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative

Detection Threshold Notify Mitigation

gt 70 Storage operations Stop provisioning of storage

gt 85 Storage operations Stop extending provisioned storage

gt 90 Storage operations Relax resource situation and migrate nomad

42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used

Settled Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used 0ndash70 70ndash85 gt 90

Provisioning New Storage Y

Extending Already Provisioned Storage

Relax UtilizationmdashNetApp Data Motion a Nomad

Y Y

Y

N N N

You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors

43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe

Elapsed Time

Capacity

1 Month 3 Months

Committed Capacity

Overall Trend

Last 3-Month Trend

Capacity Used

1 2 3

As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows

1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window

2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment

3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)

44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not

exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect

b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives

c Determine the growth rate Operations Manager provides help in determining the trend of data growth

d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past

To provision storage following these steps

1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes

2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler

entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there

is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller

c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely

d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off

Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 24: Lun Provision

24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance 1

Deduplication Block Sharing FlexVol

LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree

Deduplication Block Sharing FlexVol

Instance n

FlexClone Block Sharing

Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow

Best Practice

A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance

Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided

bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning

bull Preformatting data

25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

DEDUPE-CENTRIC STORAGE LAYOUT

In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication

This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity

This storage layout provides the following storage efficiency advantages in a short- and long-term perspective

bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns

bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data

In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation

TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment

Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates

26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically

Template

Instance 1

Instance 2

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled

Best Practice

This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication

Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume

We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties

27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION

Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach

The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left

It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to

bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a

limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the

migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval

Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate

Settled

Aggregate

Nomad Nomad

To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor

28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

SLA-BASED ASSESSMENT FOR SETTLEDNOMAD

The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances

We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment

Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances

Figure 15) Alignment by technical impact (sorted by negative impact in descending order)

Settled Nomad

Eg All FC

Instance Inst1 InstN

Medium LowInside SLANeg Impact HighOutside SLA

Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest

Figure 16) Alignment by business impact (sorted by negative impact in descending order)

Settled Nomad

Instance

Neg Impact $$ $

Semi-Settled Nomad

PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION

Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers

29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion

NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations

Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility

ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE

While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted

SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING

In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting

ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS

Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore

In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller

30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise

We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives

Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases

bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously

provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on

These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future

SITUATIONS PUTTING SLA FULFILLMENT AT RISK

Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy

The following list summarize situations that are critical for service delivery

bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time

bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use

bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react

bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this

looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a

ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk

Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are

contained

31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups

41 PHASES AND TRANSITIONS

This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness

bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase

bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase

bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase

42 MONITORING

NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors

Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set

THRESHOLDS

Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation

Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page

32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 17) Operations Manager screen to configure thresholds on operational metrics

For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation

Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data

The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations

bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge

bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification

bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor

bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification

Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state

bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached

bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification

bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality

33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

TRENDING

Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly

Figure 18) Trending of data growth and days-to-full prediction in Operations Manager

Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting

The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates

On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary

Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object

34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME

For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog

MONITORING STORAGE EFFICIENCY RETURNS

NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard

Figure 19) Storage efficiency dashboard in Operations Manager

35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

43 NOTIFICATION

Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low

After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process

Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure

Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP

NOTIFY BY E-MAIL

An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42

NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager

NOTIFY BY SNMP

Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested

36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 20) Configuring an alarm based on the threshold aggregate almost full

Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there

NOTIFY BY SCRIPT

Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line

dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full

37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

44 MITIGATE STORAGE USE

Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor

Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary

When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness

MITIGATION ACTIVITIES FOR AGGREGATES

Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow

1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives

2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero

3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space

4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage

controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability

6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes

7 If none of the listed activities can be used the application must be stopped to achieve a consistent state

The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online

38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Table 8) Mitigation alternatives to control use within aggregates

No Mitigation Activity Repeatability SLA Impact Preparation Time

Time to Show Effect

1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits

None HW procurement

Immediate (+rebalancing)

Data ONTAP 8 High limits

2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate

3 Shrink other volumes in the aggregate if they have enough free space

One time Low None Immediate

4 Run deduplication and shrink volumes Repeatable Low

Time to execute dedupe

Immediate

5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time

6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh

Next planned downtime window

Minutes Volume switch-over time

7 Prevent application data loss and stop the application then migrate (offline)

Repeatable Lowndashhigh Coordinate with app owner

Minutes Migration time

MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS

Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity

Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)

Table 9) Mitigation activities for resource tightness within volumes

No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect

1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate

2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate

3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate

4 Activate FAS deduplication for the volume (requires proper space guarantees) One time

Lowpossible performance impact

Wait for schedule Hours

5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate

Repeatable High Next planned downtime window

Minutes Volume migration time

6 Stop application and migrate data Repeatable High Coordinate wapp owner

Minutes Migration time

39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations

The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum

51 SAMPLE SETTING 1 REAL-LIFE SETTING

This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative

Figure 21) Storage to enable organic data growth between planned downtime windows

Data Data Growth

Planned Downtime Window

Planned Downtime Window

Months Time

Note Several months might fall between planned downtime windows to perform major mitigation alternatives

The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely

Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate

40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth

An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are

bull First metric Aggregate capacity used bull Second metric Aggregate space committed

Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space

Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space

Data Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used

Aggregate Space Committed

0ndash50 gt 65

0ndash110 gt 120

Provisioning New Storage Y

Capacity Assessment Adapt Thresholds

Mitigate

Y Y

Y

Provisioning New Storage Y

Assess Capacity Y

41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

52 SAMPLE SETTING 2 SETTLEDNOMAD

This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online

Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months

Settled

Detecting the Need to Act

Effect of Mitigation (eg migration)

Hours Time

N NN N N

In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on

the individual situation and is calculated against 100

The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases

Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative

Detection Threshold Notify Mitigation

gt 70 Storage operations Stop provisioning of storage

gt 85 Storage operations Stop extending provisioned storage

gt 90 Storage operations Relax resource situation and migrate nomad

42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used

Settled Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used 0ndash70 70ndash85 gt 90

Provisioning New Storage Y

Extending Already Provisioned Storage

Relax UtilizationmdashNetApp Data Motion a Nomad

Y Y

Y

N N N

You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors

43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe

Elapsed Time

Capacity

1 Month 3 Months

Committed Capacity

Overall Trend

Last 3-Month Trend

Capacity Used

1 2 3

As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows

1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window

2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment

3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)

44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not

exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect

b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives

c Determine the growth rate Operations Manager provides help in determining the trend of data growth

d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past

To provision storage following these steps

1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes

2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler

entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there

is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller

c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely

d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off

Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 25: Lun Provision

25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

DEDUPE-CENTRIC STORAGE LAYOUT

In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication

This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity

This storage layout provides the following storage efficiency advantages in a short- and long-term perspective

bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns

bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data

In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation

TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment

Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates

26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically

Template

Instance 1

Instance 2

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled

Best Practice

This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication

Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume

We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties

27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION

Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach

The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left

It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to

bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a

limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the

migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval

Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate

Settled

Aggregate

Nomad Nomad

To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor

28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

SLA-BASED ASSESSMENT FOR SETTLEDNOMAD

The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances

We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment

Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances

Figure 15) Alignment by technical impact (sorted by negative impact in descending order)

Settled Nomad

Eg All FC

Instance Inst1 InstN

Medium LowInside SLANeg Impact HighOutside SLA

Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest

Figure 16) Alignment by business impact (sorted by negative impact in descending order)

Settled Nomad

Instance

Neg Impact $$ $

Semi-Settled Nomad

PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION

Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers

29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion

NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations

Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility

ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE

While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted

SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING

In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting

ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS

Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore

In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller

30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise

We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives

Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases

bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously

provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on

These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future

SITUATIONS PUTTING SLA FULFILLMENT AT RISK

Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy

The following list summarize situations that are critical for service delivery

bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time

bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use

bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react

bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this

looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a

ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk

Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are

contained

31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups

41 PHASES AND TRANSITIONS

This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness

bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase

bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase

bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase

42 MONITORING

NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors

Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set

THRESHOLDS

Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation

Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page

32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 17) Operations Manager screen to configure thresholds on operational metrics

For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation

Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data

The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations

bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge

bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification

bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor

bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification

Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state

bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached

bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification

bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality

33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

TRENDING

Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly

Figure 18) Trending of data growth and days-to-full prediction in Operations Manager

Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting

The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates

On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary

Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object

34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME

For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog

MONITORING STORAGE EFFICIENCY RETURNS

NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard

Figure 19) Storage efficiency dashboard in Operations Manager

35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

43 NOTIFICATION

Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low

After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process

Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure

Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP

NOTIFY BY E-MAIL

An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42

NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager

NOTIFY BY SNMP

Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested

36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 20) Configuring an alarm based on the threshold aggregate almost full

Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there

NOTIFY BY SCRIPT

Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line

dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full

37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

44 MITIGATE STORAGE USE

Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor

Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary

When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness

MITIGATION ACTIVITIES FOR AGGREGATES

Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow

1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives

2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero

3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space

4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage

controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability

6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes

7 If none of the listed activities can be used the application must be stopped to achieve a consistent state

The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online

38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Table 8) Mitigation alternatives to control use within aggregates

No Mitigation Activity Repeatability SLA Impact Preparation Time

Time to Show Effect

1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits

None HW procurement

Immediate (+rebalancing)

Data ONTAP 8 High limits

2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate

3 Shrink other volumes in the aggregate if they have enough free space

One time Low None Immediate

4 Run deduplication and shrink volumes Repeatable Low

Time to execute dedupe

Immediate

5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time

6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh

Next planned downtime window

Minutes Volume switch-over time

7 Prevent application data loss and stop the application then migrate (offline)

Repeatable Lowndashhigh Coordinate with app owner

Minutes Migration time

MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS

Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity

Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)

Table 9) Mitigation activities for resource tightness within volumes

No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect

1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate

2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate

3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate

4 Activate FAS deduplication for the volume (requires proper space guarantees) One time

Lowpossible performance impact

Wait for schedule Hours

5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate

Repeatable High Next planned downtime window

Minutes Volume migration time

6 Stop application and migrate data Repeatable High Coordinate wapp owner

Minutes Migration time

39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations

The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum

51 SAMPLE SETTING 1 REAL-LIFE SETTING

This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative

Figure 21) Storage to enable organic data growth between planned downtime windows

Data Data Growth

Planned Downtime Window

Planned Downtime Window

Months Time

Note Several months might fall between planned downtime windows to perform major mitigation alternatives

The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely

Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate

40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth

An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are

bull First metric Aggregate capacity used bull Second metric Aggregate space committed

Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space

Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space

Data Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used

Aggregate Space Committed

0ndash50 gt 65

0ndash110 gt 120

Provisioning New Storage Y

Capacity Assessment Adapt Thresholds

Mitigate

Y Y

Y

Provisioning New Storage Y

Assess Capacity Y

41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

52 SAMPLE SETTING 2 SETTLEDNOMAD

This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online

Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months

Settled

Detecting the Need to Act

Effect of Mitigation (eg migration)

Hours Time

N NN N N

In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on

the individual situation and is calculated against 100

The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases

Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative

Detection Threshold Notify Mitigation

gt 70 Storage operations Stop provisioning of storage

gt 85 Storage operations Stop extending provisioned storage

gt 90 Storage operations Relax resource situation and migrate nomad

42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used

Settled Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used 0ndash70 70ndash85 gt 90

Provisioning New Storage Y

Extending Already Provisioned Storage

Relax UtilizationmdashNetApp Data Motion a Nomad

Y Y

Y

N N N

You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors

43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe

Elapsed Time

Capacity

1 Month 3 Months

Committed Capacity

Overall Trend

Last 3-Month Trend

Capacity Used

1 2 3

As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows

1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window

2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment

3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)

44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not

exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect

b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives

c Determine the growth rate Operations Manager provides help in determining the trend of data growth

d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past

To provision storage following these steps

1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes

2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler

entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there

is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller

c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely

d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off

Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 26: Lun Provision

26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically

Template

Instance 1

Instance 2

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Deduplication Block Sharing

within FlexVolume

FlexVol

LUNqtree

LUNqtree

LUNqtree

Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled

Best Practice

This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication

Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume

We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties

27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION

Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach

The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left

It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to

bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a

limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the

migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval

Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate

Settled

Aggregate

Nomad Nomad

To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor

28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

SLA-BASED ASSESSMENT FOR SETTLEDNOMAD

The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances

We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment

Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances

Figure 15) Alignment by technical impact (sorted by negative impact in descending order)

Settled Nomad

Eg All FC

Instance Inst1 InstN

Medium LowInside SLANeg Impact HighOutside SLA

Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest

Figure 16) Alignment by business impact (sorted by negative impact in descending order)

Settled Nomad

Instance

Neg Impact $$ $

Semi-Settled Nomad

PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION

Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers

29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion

NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations

Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility

ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE

While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted

SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING

In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting

ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS

Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore

In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller

30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise

We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives

Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases

bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously

provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on

These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future

SITUATIONS PUTTING SLA FULFILLMENT AT RISK

Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy

The following list summarize situations that are critical for service delivery

bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time

bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use

bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react

bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this

looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a

ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk

Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are

contained

31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups

41 PHASES AND TRANSITIONS

This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness

bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase

bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase

bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase

42 MONITORING

NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors

Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set

THRESHOLDS

Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation

Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page

32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 17) Operations Manager screen to configure thresholds on operational metrics

For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation

Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data

The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations

bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge

bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification

bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor

bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification

Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state

bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached

bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification

bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality

33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

TRENDING

Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly

Figure 18) Trending of data growth and days-to-full prediction in Operations Manager

Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting

The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates

On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary

Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object

34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME

For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog

MONITORING STORAGE EFFICIENCY RETURNS

NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard

Figure 19) Storage efficiency dashboard in Operations Manager

35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

43 NOTIFICATION

Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low

After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process

Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure

Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP

NOTIFY BY E-MAIL

An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42

NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager

NOTIFY BY SNMP

Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested

36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 20) Configuring an alarm based on the threshold aggregate almost full

Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there

NOTIFY BY SCRIPT

Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line

dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full

37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

44 MITIGATE STORAGE USE

Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor

Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary

When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness

MITIGATION ACTIVITIES FOR AGGREGATES

Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow

1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives

2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero

3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space

4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage

controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability

6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes

7 If none of the listed activities can be used the application must be stopped to achieve a consistent state

The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online

38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Table 8) Mitigation alternatives to control use within aggregates

No Mitigation Activity Repeatability SLA Impact Preparation Time

Time to Show Effect

1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits

None HW procurement

Immediate (+rebalancing)

Data ONTAP 8 High limits

2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate

3 Shrink other volumes in the aggregate if they have enough free space

One time Low None Immediate

4 Run deduplication and shrink volumes Repeatable Low

Time to execute dedupe

Immediate

5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time

6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh

Next planned downtime window

Minutes Volume switch-over time

7 Prevent application data loss and stop the application then migrate (offline)

Repeatable Lowndashhigh Coordinate with app owner

Minutes Migration time

MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS

Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity

Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)

Table 9) Mitigation activities for resource tightness within volumes

No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect

1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate

2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate

3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate

4 Activate FAS deduplication for the volume (requires proper space guarantees) One time

Lowpossible performance impact

Wait for schedule Hours

5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate

Repeatable High Next planned downtime window

Minutes Volume migration time

6 Stop application and migrate data Repeatable High Coordinate wapp owner

Minutes Migration time

39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations

The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum

51 SAMPLE SETTING 1 REAL-LIFE SETTING

This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative

Figure 21) Storage to enable organic data growth between planned downtime windows

Data Data Growth

Planned Downtime Window

Planned Downtime Window

Months Time

Note Several months might fall between planned downtime windows to perform major mitigation alternatives

The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely

Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate

40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth

An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are

bull First metric Aggregate capacity used bull Second metric Aggregate space committed

Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space

Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space

Data Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used

Aggregate Space Committed

0ndash50 gt 65

0ndash110 gt 120

Provisioning New Storage Y

Capacity Assessment Adapt Thresholds

Mitigate

Y Y

Y

Provisioning New Storage Y

Assess Capacity Y

41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

52 SAMPLE SETTING 2 SETTLEDNOMAD

This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online

Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months

Settled

Detecting the Need to Act

Effect of Mitigation (eg migration)

Hours Time

N NN N N

In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on

the individual situation and is calculated against 100

The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases

Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative

Detection Threshold Notify Mitigation

gt 70 Storage operations Stop provisioning of storage

gt 85 Storage operations Stop extending provisioned storage

gt 90 Storage operations Relax resource situation and migrate nomad

42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used

Settled Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used 0ndash70 70ndash85 gt 90

Provisioning New Storage Y

Extending Already Provisioned Storage

Relax UtilizationmdashNetApp Data Motion a Nomad

Y Y

Y

N N N

You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors

43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe

Elapsed Time

Capacity

1 Month 3 Months

Committed Capacity

Overall Trend

Last 3-Month Trend

Capacity Used

1 2 3

As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows

1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window

2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment

3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)

44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not

exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect

b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives

c Determine the growth rate Operations Manager provides help in determining the trend of data growth

d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past

To provision storage following these steps

1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes

2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler

entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there

is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller

c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely

d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off

Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 27: Lun Provision

27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION

Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach

The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left

It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to

bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a

limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the

migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval

Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate

Settled

Aggregate

Nomad Nomad

To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor

28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

SLA-BASED ASSESSMENT FOR SETTLEDNOMAD

The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances

We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment

Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances

Figure 15) Alignment by technical impact (sorted by negative impact in descending order)

Settled Nomad

Eg All FC

Instance Inst1 InstN

Medium LowInside SLANeg Impact HighOutside SLA

Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest

Figure 16) Alignment by business impact (sorted by negative impact in descending order)

Settled Nomad

Instance

Neg Impact $$ $

Semi-Settled Nomad

PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION

Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers

29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion

NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations

Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility

ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE

While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted

SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING

In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting

ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS

Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore

In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller

30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise

We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives

Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases

bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously

provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on

These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future

SITUATIONS PUTTING SLA FULFILLMENT AT RISK

Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy

The following list summarize situations that are critical for service delivery

bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time

bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use

bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react

bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this

looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a

ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk

Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are

contained

31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups

41 PHASES AND TRANSITIONS

This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness

bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase

bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase

bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase

42 MONITORING

NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors

Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set

THRESHOLDS

Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation

Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page

32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 17) Operations Manager screen to configure thresholds on operational metrics

For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation

Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data

The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations

bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge

bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification

bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor

bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification

Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state

bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached

bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification

bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality

33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

TRENDING

Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly

Figure 18) Trending of data growth and days-to-full prediction in Operations Manager

Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting

The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates

On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary

Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object

34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME

For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog

MONITORING STORAGE EFFICIENCY RETURNS

NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard

Figure 19) Storage efficiency dashboard in Operations Manager

35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

43 NOTIFICATION

Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low

After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process

Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure

Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP

NOTIFY BY E-MAIL

An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42

NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager

NOTIFY BY SNMP

Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested

36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 20) Configuring an alarm based on the threshold aggregate almost full

Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there

NOTIFY BY SCRIPT

Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line

dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full

37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

44 MITIGATE STORAGE USE

Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor

Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary

When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness

MITIGATION ACTIVITIES FOR AGGREGATES

Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow

1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives

2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero

3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space

4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage

controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability

6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes

7 If none of the listed activities can be used the application must be stopped to achieve a consistent state

The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online

38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Table 8) Mitigation alternatives to control use within aggregates

No Mitigation Activity Repeatability SLA Impact Preparation Time

Time to Show Effect

1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits

None HW procurement

Immediate (+rebalancing)

Data ONTAP 8 High limits

2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate

3 Shrink other volumes in the aggregate if they have enough free space

One time Low None Immediate

4 Run deduplication and shrink volumes Repeatable Low

Time to execute dedupe

Immediate

5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time

6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh

Next planned downtime window

Minutes Volume switch-over time

7 Prevent application data loss and stop the application then migrate (offline)

Repeatable Lowndashhigh Coordinate with app owner

Minutes Migration time

MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS

Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity

Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)

Table 9) Mitigation activities for resource tightness within volumes

No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect

1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate

2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate

3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate

4 Activate FAS deduplication for the volume (requires proper space guarantees) One time

Lowpossible performance impact

Wait for schedule Hours

5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate

Repeatable High Next planned downtime window

Minutes Volume migration time

6 Stop application and migrate data Repeatable High Coordinate wapp owner

Minutes Migration time

39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations

The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum

51 SAMPLE SETTING 1 REAL-LIFE SETTING

This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative

Figure 21) Storage to enable organic data growth between planned downtime windows

Data Data Growth

Planned Downtime Window

Planned Downtime Window

Months Time

Note Several months might fall between planned downtime windows to perform major mitigation alternatives

The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely

Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate

40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth

An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are

bull First metric Aggregate capacity used bull Second metric Aggregate space committed

Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space

Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space

Data Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used

Aggregate Space Committed

0ndash50 gt 65

0ndash110 gt 120

Provisioning New Storage Y

Capacity Assessment Adapt Thresholds

Mitigate

Y Y

Y

Provisioning New Storage Y

Assess Capacity Y

41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

52 SAMPLE SETTING 2 SETTLEDNOMAD

This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online

Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months

Settled

Detecting the Need to Act

Effect of Mitigation (eg migration)

Hours Time

N NN N N

In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on

the individual situation and is calculated against 100

The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases

Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative

Detection Threshold Notify Mitigation

gt 70 Storage operations Stop provisioning of storage

gt 85 Storage operations Stop extending provisioned storage

gt 90 Storage operations Relax resource situation and migrate nomad

42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used

Settled Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used 0ndash70 70ndash85 gt 90

Provisioning New Storage Y

Extending Already Provisioned Storage

Relax UtilizationmdashNetApp Data Motion a Nomad

Y Y

Y

N N N

You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors

43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe

Elapsed Time

Capacity

1 Month 3 Months

Committed Capacity

Overall Trend

Last 3-Month Trend

Capacity Used

1 2 3

As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows

1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window

2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment

3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)

44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not

exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect

b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives

c Determine the growth rate Operations Manager provides help in determining the trend of data growth

d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past

To provision storage following these steps

1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes

2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler

entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there

is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller

c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely

d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off

Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 28: Lun Provision

28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

SLA-BASED ASSESSMENT FOR SETTLEDNOMAD

The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances

We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment

Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances

Figure 15) Alignment by technical impact (sorted by negative impact in descending order)

Settled Nomad

Eg All FC

Instance Inst1 InstN

Medium LowInside SLANeg Impact HighOutside SLA

Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest

Figure 16) Alignment by business impact (sorted by negative impact in descending order)

Settled Nomad

Instance

Neg Impact $$ $

Semi-Settled Nomad

PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION

Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers

29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion

NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations

Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility

ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE

While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted

SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING

In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting

ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS

Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore

In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller

30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise

We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives

Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases

bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously

provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on

These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future

SITUATIONS PUTTING SLA FULFILLMENT AT RISK

Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy

The following list summarize situations that are critical for service delivery

bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time

bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use

bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react

bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this

looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a

ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk

Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are

contained

31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups

41 PHASES AND TRANSITIONS

This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness

bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase

bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase

bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase

42 MONITORING

NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors

Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set

THRESHOLDS

Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation

Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page

32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 17) Operations Manager screen to configure thresholds on operational metrics

For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation

Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data

The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations

bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge

bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification

bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor

bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification

Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state

bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached

bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification

bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality

33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

TRENDING

Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly

Figure 18) Trending of data growth and days-to-full prediction in Operations Manager

Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting

The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates

On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary

Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object

34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME

For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog

MONITORING STORAGE EFFICIENCY RETURNS

NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard

Figure 19) Storage efficiency dashboard in Operations Manager

35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

43 NOTIFICATION

Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low

After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process

Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure

Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP

NOTIFY BY E-MAIL

An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42

NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager

NOTIFY BY SNMP

Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested

36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 20) Configuring an alarm based on the threshold aggregate almost full

Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there

NOTIFY BY SCRIPT

Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line

dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full

37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

44 MITIGATE STORAGE USE

Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor

Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary

When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness

MITIGATION ACTIVITIES FOR AGGREGATES

Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow

1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives

2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero

3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space

4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage

controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability

6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes

7 If none of the listed activities can be used the application must be stopped to achieve a consistent state

The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online

38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Table 8) Mitigation alternatives to control use within aggregates

No Mitigation Activity Repeatability SLA Impact Preparation Time

Time to Show Effect

1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits

None HW procurement

Immediate (+rebalancing)

Data ONTAP 8 High limits

2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate

3 Shrink other volumes in the aggregate if they have enough free space

One time Low None Immediate

4 Run deduplication and shrink volumes Repeatable Low

Time to execute dedupe

Immediate

5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time

6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh

Next planned downtime window

Minutes Volume switch-over time

7 Prevent application data loss and stop the application then migrate (offline)

Repeatable Lowndashhigh Coordinate with app owner

Minutes Migration time

MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS

Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity

Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)

Table 9) Mitigation activities for resource tightness within volumes

No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect

1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate

2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate

3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate

4 Activate FAS deduplication for the volume (requires proper space guarantees) One time

Lowpossible performance impact

Wait for schedule Hours

5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate

Repeatable High Next planned downtime window

Minutes Volume migration time

6 Stop application and migrate data Repeatable High Coordinate wapp owner

Minutes Migration time

39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations

The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum

51 SAMPLE SETTING 1 REAL-LIFE SETTING

This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative

Figure 21) Storage to enable organic data growth between planned downtime windows

Data Data Growth

Planned Downtime Window

Planned Downtime Window

Months Time

Note Several months might fall between planned downtime windows to perform major mitigation alternatives

The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely

Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate

40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth

An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are

bull First metric Aggregate capacity used bull Second metric Aggregate space committed

Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space

Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space

Data Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used

Aggregate Space Committed

0ndash50 gt 65

0ndash110 gt 120

Provisioning New Storage Y

Capacity Assessment Adapt Thresholds

Mitigate

Y Y

Y

Provisioning New Storage Y

Assess Capacity Y

41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

52 SAMPLE SETTING 2 SETTLEDNOMAD

This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online

Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months

Settled

Detecting the Need to Act

Effect of Mitigation (eg migration)

Hours Time

N NN N N

In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on

the individual situation and is calculated against 100

The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases

Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative

Detection Threshold Notify Mitigation

gt 70 Storage operations Stop provisioning of storage

gt 85 Storage operations Stop extending provisioned storage

gt 90 Storage operations Relax resource situation and migrate nomad

42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used

Settled Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used 0ndash70 70ndash85 gt 90

Provisioning New Storage Y

Extending Already Provisioned Storage

Relax UtilizationmdashNetApp Data Motion a Nomad

Y Y

Y

N N N

You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors

43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe

Elapsed Time

Capacity

1 Month 3 Months

Committed Capacity

Overall Trend

Last 3-Month Trend

Capacity Used

1 2 3

As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows

1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window

2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment

3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)

44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not

exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect

b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives

c Determine the growth rate Operations Manager provides help in determining the trend of data growth

d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past

To provision storage following these steps

1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes

2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler

entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there

is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller

c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely

d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off

Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 29: Lun Provision

29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion

NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations

Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility

ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE

While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted

SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING

In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting

ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS

Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore

In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller

30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise

We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives

Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases

bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously

provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on

These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future

SITUATIONS PUTTING SLA FULFILLMENT AT RISK

Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy

The following list summarize situations that are critical for service delivery

bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time

bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use

bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react

bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this

looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a

ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk

Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are

contained

31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups

41 PHASES AND TRANSITIONS

This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness

bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase

bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase

bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase

42 MONITORING

NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors

Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set

THRESHOLDS

Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation

Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page

32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 17) Operations Manager screen to configure thresholds on operational metrics

For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation

Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data

The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations

bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge

bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification

bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor

bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification

Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state

bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached

bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification

bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality

33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

TRENDING

Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly

Figure 18) Trending of data growth and days-to-full prediction in Operations Manager

Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting

The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates

On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary

Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object

34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME

For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog

MONITORING STORAGE EFFICIENCY RETURNS

NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard

Figure 19) Storage efficiency dashboard in Operations Manager

35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

43 NOTIFICATION

Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low

After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process

Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure

Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP

NOTIFY BY E-MAIL

An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42

NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager

NOTIFY BY SNMP

Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested

36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 20) Configuring an alarm based on the threshold aggregate almost full

Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there

NOTIFY BY SCRIPT

Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line

dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full

37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

44 MITIGATE STORAGE USE

Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor

Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary

When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness

MITIGATION ACTIVITIES FOR AGGREGATES

Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow

1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives

2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero

3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space

4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage

controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability

6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes

7 If none of the listed activities can be used the application must be stopped to achieve a consistent state

The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online

38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Table 8) Mitigation alternatives to control use within aggregates

No Mitigation Activity Repeatability SLA Impact Preparation Time

Time to Show Effect

1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits

None HW procurement

Immediate (+rebalancing)

Data ONTAP 8 High limits

2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate

3 Shrink other volumes in the aggregate if they have enough free space

One time Low None Immediate

4 Run deduplication and shrink volumes Repeatable Low

Time to execute dedupe

Immediate

5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time

6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh

Next planned downtime window

Minutes Volume switch-over time

7 Prevent application data loss and stop the application then migrate (offline)

Repeatable Lowndashhigh Coordinate with app owner

Minutes Migration time

MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS

Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity

Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)

Table 9) Mitigation activities for resource tightness within volumes

No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect

1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate

2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate

3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate

4 Activate FAS deduplication for the volume (requires proper space guarantees) One time

Lowpossible performance impact

Wait for schedule Hours

5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate

Repeatable High Next planned downtime window

Minutes Volume migration time

6 Stop application and migrate data Repeatable High Coordinate wapp owner

Minutes Migration time

39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations

The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum

51 SAMPLE SETTING 1 REAL-LIFE SETTING

This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative

Figure 21) Storage to enable organic data growth between planned downtime windows

Data Data Growth

Planned Downtime Window

Planned Downtime Window

Months Time

Note Several months might fall between planned downtime windows to perform major mitigation alternatives

The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely

Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate

40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth

An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are

bull First metric Aggregate capacity used bull Second metric Aggregate space committed

Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space

Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space

Data Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used

Aggregate Space Committed

0ndash50 gt 65

0ndash110 gt 120

Provisioning New Storage Y

Capacity Assessment Adapt Thresholds

Mitigate

Y Y

Y

Provisioning New Storage Y

Assess Capacity Y

41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

52 SAMPLE SETTING 2 SETTLEDNOMAD

This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online

Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months

Settled

Detecting the Need to Act

Effect of Mitigation (eg migration)

Hours Time

N NN N N

In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on

the individual situation and is calculated against 100

The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases

Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative

Detection Threshold Notify Mitigation

gt 70 Storage operations Stop provisioning of storage

gt 85 Storage operations Stop extending provisioned storage

gt 90 Storage operations Relax resource situation and migrate nomad

42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used

Settled Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used 0ndash70 70ndash85 gt 90

Provisioning New Storage Y

Extending Already Provisioned Storage

Relax UtilizationmdashNetApp Data Motion a Nomad

Y Y

Y

N N N

You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors

43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe

Elapsed Time

Capacity

1 Month 3 Months

Committed Capacity

Overall Trend

Last 3-Month Trend

Capacity Used

1 2 3

As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows

1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window

2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment

3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)

44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not

exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect

b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives

c Determine the growth rate Operations Manager provides help in determining the trend of data growth

d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past

To provision storage following these steps

1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes

2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler

entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there

is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller

c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely

d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off

Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 30: Lun Provision

30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise

We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives

Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases

bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously

provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on

These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future

SITUATIONS PUTTING SLA FULFILLMENT AT RISK

Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy

The following list summarize situations that are critical for service delivery

bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time

bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use

bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react

bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this

looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a

ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk

Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are

contained

31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups

41 PHASES AND TRANSITIONS

This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness

bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase

bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase

bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase

42 MONITORING

NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors

Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set

THRESHOLDS

Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation

Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page

32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 17) Operations Manager screen to configure thresholds on operational metrics

For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation

Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data

The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations

bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge

bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification

bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor

bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification

Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state

bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached

bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification

bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality

33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

TRENDING

Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly

Figure 18) Trending of data growth and days-to-full prediction in Operations Manager

Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting

The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates

On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary

Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object

34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME

For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog

MONITORING STORAGE EFFICIENCY RETURNS

NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard

Figure 19) Storage efficiency dashboard in Operations Manager

35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

43 NOTIFICATION

Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low

After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process

Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure

Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP

NOTIFY BY E-MAIL

An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42

NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager

NOTIFY BY SNMP

Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested

36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 20) Configuring an alarm based on the threshold aggregate almost full

Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there

NOTIFY BY SCRIPT

Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line

dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full

37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

44 MITIGATE STORAGE USE

Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor

Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary

When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness

MITIGATION ACTIVITIES FOR AGGREGATES

Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow

1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives

2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero

3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space

4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage

controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability

6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes

7 If none of the listed activities can be used the application must be stopped to achieve a consistent state

The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online

38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Table 8) Mitigation alternatives to control use within aggregates

No Mitigation Activity Repeatability SLA Impact Preparation Time

Time to Show Effect

1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits

None HW procurement

Immediate (+rebalancing)

Data ONTAP 8 High limits

2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate

3 Shrink other volumes in the aggregate if they have enough free space

One time Low None Immediate

4 Run deduplication and shrink volumes Repeatable Low

Time to execute dedupe

Immediate

5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time

6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh

Next planned downtime window

Minutes Volume switch-over time

7 Prevent application data loss and stop the application then migrate (offline)

Repeatable Lowndashhigh Coordinate with app owner

Minutes Migration time

MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS

Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity

Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)

Table 9) Mitigation activities for resource tightness within volumes

No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect

1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate

2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate

3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate

4 Activate FAS deduplication for the volume (requires proper space guarantees) One time

Lowpossible performance impact

Wait for schedule Hours

5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate

Repeatable High Next planned downtime window

Minutes Volume migration time

6 Stop application and migrate data Repeatable High Coordinate wapp owner

Minutes Migration time

39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations

The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum

51 SAMPLE SETTING 1 REAL-LIFE SETTING

This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative

Figure 21) Storage to enable organic data growth between planned downtime windows

Data Data Growth

Planned Downtime Window

Planned Downtime Window

Months Time

Note Several months might fall between planned downtime windows to perform major mitigation alternatives

The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely

Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate

40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth

An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are

bull First metric Aggregate capacity used bull Second metric Aggregate space committed

Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space

Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space

Data Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used

Aggregate Space Committed

0ndash50 gt 65

0ndash110 gt 120

Provisioning New Storage Y

Capacity Assessment Adapt Thresholds

Mitigate

Y Y

Y

Provisioning New Storage Y

Assess Capacity Y

41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

52 SAMPLE SETTING 2 SETTLEDNOMAD

This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online

Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months

Settled

Detecting the Need to Act

Effect of Mitigation (eg migration)

Hours Time

N NN N N

In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on

the individual situation and is calculated against 100

The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases

Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative

Detection Threshold Notify Mitigation

gt 70 Storage operations Stop provisioning of storage

gt 85 Storage operations Stop extending provisioned storage

gt 90 Storage operations Relax resource situation and migrate nomad

42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used

Settled Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used 0ndash70 70ndash85 gt 90

Provisioning New Storage Y

Extending Already Provisioned Storage

Relax UtilizationmdashNetApp Data Motion a Nomad

Y Y

Y

N N N

You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors

43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe

Elapsed Time

Capacity

1 Month 3 Months

Committed Capacity

Overall Trend

Last 3-Month Trend

Capacity Used

1 2 3

As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows

1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window

2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment

3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)

44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not

exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect

b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives

c Determine the growth rate Operations Manager provides help in determining the trend of data growth

d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past

To provision storage following these steps

1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes

2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler

entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there

is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller

c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely

d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off

Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 31: Lun Provision

31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups

41 PHASES AND TRANSITIONS

This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness

bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase

bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase

bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase

42 MONITORING

NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors

Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set

THRESHOLDS

Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation

Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page

32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 17) Operations Manager screen to configure thresholds on operational metrics

For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation

Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data

The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations

bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge

bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification

bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor

bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification

Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state

bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached

bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification

bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality

33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

TRENDING

Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly

Figure 18) Trending of data growth and days-to-full prediction in Operations Manager

Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting

The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates

On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary

Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object

34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME

For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog

MONITORING STORAGE EFFICIENCY RETURNS

NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard

Figure 19) Storage efficiency dashboard in Operations Manager

35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

43 NOTIFICATION

Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low

After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process

Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure

Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP

NOTIFY BY E-MAIL

An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42

NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager

NOTIFY BY SNMP

Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested

36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 20) Configuring an alarm based on the threshold aggregate almost full

Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there

NOTIFY BY SCRIPT

Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line

dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full

37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

44 MITIGATE STORAGE USE

Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor

Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary

When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness

MITIGATION ACTIVITIES FOR AGGREGATES

Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow

1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives

2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero

3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space

4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage

controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability

6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes

7 If none of the listed activities can be used the application must be stopped to achieve a consistent state

The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online

38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Table 8) Mitigation alternatives to control use within aggregates

No Mitigation Activity Repeatability SLA Impact Preparation Time

Time to Show Effect

1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits

None HW procurement

Immediate (+rebalancing)

Data ONTAP 8 High limits

2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate

3 Shrink other volumes in the aggregate if they have enough free space

One time Low None Immediate

4 Run deduplication and shrink volumes Repeatable Low

Time to execute dedupe

Immediate

5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time

6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh

Next planned downtime window

Minutes Volume switch-over time

7 Prevent application data loss and stop the application then migrate (offline)

Repeatable Lowndashhigh Coordinate with app owner

Minutes Migration time

MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS

Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity

Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)

Table 9) Mitigation activities for resource tightness within volumes

No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect

1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate

2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate

3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate

4 Activate FAS deduplication for the volume (requires proper space guarantees) One time

Lowpossible performance impact

Wait for schedule Hours

5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate

Repeatable High Next planned downtime window

Minutes Volume migration time

6 Stop application and migrate data Repeatable High Coordinate wapp owner

Minutes Migration time

39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations

The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum

51 SAMPLE SETTING 1 REAL-LIFE SETTING

This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative

Figure 21) Storage to enable organic data growth between planned downtime windows

Data Data Growth

Planned Downtime Window

Planned Downtime Window

Months Time

Note Several months might fall between planned downtime windows to perform major mitigation alternatives

The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely

Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate

40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth

An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are

bull First metric Aggregate capacity used bull Second metric Aggregate space committed

Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space

Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space

Data Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used

Aggregate Space Committed

0ndash50 gt 65

0ndash110 gt 120

Provisioning New Storage Y

Capacity Assessment Adapt Thresholds

Mitigate

Y Y

Y

Provisioning New Storage Y

Assess Capacity Y

41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

52 SAMPLE SETTING 2 SETTLEDNOMAD

This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online

Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months

Settled

Detecting the Need to Act

Effect of Mitigation (eg migration)

Hours Time

N NN N N

In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on

the individual situation and is calculated against 100

The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases

Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative

Detection Threshold Notify Mitigation

gt 70 Storage operations Stop provisioning of storage

gt 85 Storage operations Stop extending provisioned storage

gt 90 Storage operations Relax resource situation and migrate nomad

42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used

Settled Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used 0ndash70 70ndash85 gt 90

Provisioning New Storage Y

Extending Already Provisioned Storage

Relax UtilizationmdashNetApp Data Motion a Nomad

Y Y

Y

N N N

You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors

43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe

Elapsed Time

Capacity

1 Month 3 Months

Committed Capacity

Overall Trend

Last 3-Month Trend

Capacity Used

1 2 3

As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows

1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window

2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment

3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)

44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not

exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect

b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives

c Determine the growth rate Operations Manager provides help in determining the trend of data growth

d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past

To provision storage following these steps

1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes

2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler

entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there

is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller

c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely

d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off

Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 32: Lun Provision

32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 17) Operations Manager screen to configure thresholds on operational metrics

For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation

Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data

The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations

bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge

bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification

bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor

bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification

Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state

bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached

bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification

bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality

33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

TRENDING

Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly

Figure 18) Trending of data growth and days-to-full prediction in Operations Manager

Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting

The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates

On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary

Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object

34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME

For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog

MONITORING STORAGE EFFICIENCY RETURNS

NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard

Figure 19) Storage efficiency dashboard in Operations Manager

35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

43 NOTIFICATION

Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low

After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process

Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure

Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP

NOTIFY BY E-MAIL

An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42

NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager

NOTIFY BY SNMP

Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested

36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 20) Configuring an alarm based on the threshold aggregate almost full

Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there

NOTIFY BY SCRIPT

Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line

dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full

37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

44 MITIGATE STORAGE USE

Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor

Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary

When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness

MITIGATION ACTIVITIES FOR AGGREGATES

Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow

1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives

2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero

3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space

4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage

controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability

6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes

7 If none of the listed activities can be used the application must be stopped to achieve a consistent state

The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online

38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Table 8) Mitigation alternatives to control use within aggregates

No Mitigation Activity Repeatability SLA Impact Preparation Time

Time to Show Effect

1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits

None HW procurement

Immediate (+rebalancing)

Data ONTAP 8 High limits

2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate

3 Shrink other volumes in the aggregate if they have enough free space

One time Low None Immediate

4 Run deduplication and shrink volumes Repeatable Low

Time to execute dedupe

Immediate

5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time

6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh

Next planned downtime window

Minutes Volume switch-over time

7 Prevent application data loss and stop the application then migrate (offline)

Repeatable Lowndashhigh Coordinate with app owner

Minutes Migration time

MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS

Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity

Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)

Table 9) Mitigation activities for resource tightness within volumes

No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect

1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate

2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate

3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate

4 Activate FAS deduplication for the volume (requires proper space guarantees) One time

Lowpossible performance impact

Wait for schedule Hours

5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate

Repeatable High Next planned downtime window

Minutes Volume migration time

6 Stop application and migrate data Repeatable High Coordinate wapp owner

Minutes Migration time

39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations

The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum

51 SAMPLE SETTING 1 REAL-LIFE SETTING

This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative

Figure 21) Storage to enable organic data growth between planned downtime windows

Data Data Growth

Planned Downtime Window

Planned Downtime Window

Months Time

Note Several months might fall between planned downtime windows to perform major mitigation alternatives

The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely

Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate

40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth

An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are

bull First metric Aggregate capacity used bull Second metric Aggregate space committed

Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space

Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space

Data Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used

Aggregate Space Committed

0ndash50 gt 65

0ndash110 gt 120

Provisioning New Storage Y

Capacity Assessment Adapt Thresholds

Mitigate

Y Y

Y

Provisioning New Storage Y

Assess Capacity Y

41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

52 SAMPLE SETTING 2 SETTLEDNOMAD

This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online

Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months

Settled

Detecting the Need to Act

Effect of Mitigation (eg migration)

Hours Time

N NN N N

In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on

the individual situation and is calculated against 100

The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases

Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative

Detection Threshold Notify Mitigation

gt 70 Storage operations Stop provisioning of storage

gt 85 Storage operations Stop extending provisioned storage

gt 90 Storage operations Relax resource situation and migrate nomad

42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used

Settled Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used 0ndash70 70ndash85 gt 90

Provisioning New Storage Y

Extending Already Provisioned Storage

Relax UtilizationmdashNetApp Data Motion a Nomad

Y Y

Y

N N N

You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors

43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe

Elapsed Time

Capacity

1 Month 3 Months

Committed Capacity

Overall Trend

Last 3-Month Trend

Capacity Used

1 2 3

As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows

1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window

2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment

3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)

44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not

exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect

b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives

c Determine the growth rate Operations Manager provides help in determining the trend of data growth

d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past

To provision storage following these steps

1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes

2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler

entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there

is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller

c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely

d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off

Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 33: Lun Provision

33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

TRENDING

Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly

Figure 18) Trending of data growth and days-to-full prediction in Operations Manager

Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting

The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates

On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary

Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object

34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME

For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog

MONITORING STORAGE EFFICIENCY RETURNS

NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard

Figure 19) Storage efficiency dashboard in Operations Manager

35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

43 NOTIFICATION

Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low

After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process

Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure

Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP

NOTIFY BY E-MAIL

An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42

NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager

NOTIFY BY SNMP

Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested

36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 20) Configuring an alarm based on the threshold aggregate almost full

Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there

NOTIFY BY SCRIPT

Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line

dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full

37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

44 MITIGATE STORAGE USE

Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor

Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary

When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness

MITIGATION ACTIVITIES FOR AGGREGATES

Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow

1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives

2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero

3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space

4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage

controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability

6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes

7 If none of the listed activities can be used the application must be stopped to achieve a consistent state

The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online

38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Table 8) Mitigation alternatives to control use within aggregates

No Mitigation Activity Repeatability SLA Impact Preparation Time

Time to Show Effect

1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits

None HW procurement

Immediate (+rebalancing)

Data ONTAP 8 High limits

2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate

3 Shrink other volumes in the aggregate if they have enough free space

One time Low None Immediate

4 Run deduplication and shrink volumes Repeatable Low

Time to execute dedupe

Immediate

5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time

6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh

Next planned downtime window

Minutes Volume switch-over time

7 Prevent application data loss and stop the application then migrate (offline)

Repeatable Lowndashhigh Coordinate with app owner

Minutes Migration time

MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS

Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity

Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)

Table 9) Mitigation activities for resource tightness within volumes

No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect

1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate

2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate

3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate

4 Activate FAS deduplication for the volume (requires proper space guarantees) One time

Lowpossible performance impact

Wait for schedule Hours

5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate

Repeatable High Next planned downtime window

Minutes Volume migration time

6 Stop application and migrate data Repeatable High Coordinate wapp owner

Minutes Migration time

39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations

The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum

51 SAMPLE SETTING 1 REAL-LIFE SETTING

This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative

Figure 21) Storage to enable organic data growth between planned downtime windows

Data Data Growth

Planned Downtime Window

Planned Downtime Window

Months Time

Note Several months might fall between planned downtime windows to perform major mitigation alternatives

The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely

Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate

40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth

An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are

bull First metric Aggregate capacity used bull Second metric Aggregate space committed

Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space

Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space

Data Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used

Aggregate Space Committed

0ndash50 gt 65

0ndash110 gt 120

Provisioning New Storage Y

Capacity Assessment Adapt Thresholds

Mitigate

Y Y

Y

Provisioning New Storage Y

Assess Capacity Y

41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

52 SAMPLE SETTING 2 SETTLEDNOMAD

This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online

Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months

Settled

Detecting the Need to Act

Effect of Mitigation (eg migration)

Hours Time

N NN N N

In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on

the individual situation and is calculated against 100

The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases

Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative

Detection Threshold Notify Mitigation

gt 70 Storage operations Stop provisioning of storage

gt 85 Storage operations Stop extending provisioned storage

gt 90 Storage operations Relax resource situation and migrate nomad

42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used

Settled Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used 0ndash70 70ndash85 gt 90

Provisioning New Storage Y

Extending Already Provisioned Storage

Relax UtilizationmdashNetApp Data Motion a Nomad

Y Y

Y

N N N

You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors

43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe

Elapsed Time

Capacity

1 Month 3 Months

Committed Capacity

Overall Trend

Last 3-Month Trend

Capacity Used

1 2 3

As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows

1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window

2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment

3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)

44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not

exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect

b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives

c Determine the growth rate Operations Manager provides help in determining the trend of data growth

d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past

To provision storage following these steps

1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes

2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler

entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there

is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller

c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely

d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off

Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 34: Lun Provision

34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME

For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog

MONITORING STORAGE EFFICIENCY RETURNS

NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard

Figure 19) Storage efficiency dashboard in Operations Manager

35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

43 NOTIFICATION

Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low

After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process

Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure

Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP

NOTIFY BY E-MAIL

An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42

NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager

NOTIFY BY SNMP

Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested

36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 20) Configuring an alarm based on the threshold aggregate almost full

Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there

NOTIFY BY SCRIPT

Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line

dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full

37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

44 MITIGATE STORAGE USE

Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor

Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary

When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness

MITIGATION ACTIVITIES FOR AGGREGATES

Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow

1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives

2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero

3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space

4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage

controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability

6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes

7 If none of the listed activities can be used the application must be stopped to achieve a consistent state

The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online

38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Table 8) Mitigation alternatives to control use within aggregates

No Mitigation Activity Repeatability SLA Impact Preparation Time

Time to Show Effect

1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits

None HW procurement

Immediate (+rebalancing)

Data ONTAP 8 High limits

2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate

3 Shrink other volumes in the aggregate if they have enough free space

One time Low None Immediate

4 Run deduplication and shrink volumes Repeatable Low

Time to execute dedupe

Immediate

5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time

6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh

Next planned downtime window

Minutes Volume switch-over time

7 Prevent application data loss and stop the application then migrate (offline)

Repeatable Lowndashhigh Coordinate with app owner

Minutes Migration time

MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS

Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity

Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)

Table 9) Mitigation activities for resource tightness within volumes

No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect

1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate

2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate

3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate

4 Activate FAS deduplication for the volume (requires proper space guarantees) One time

Lowpossible performance impact

Wait for schedule Hours

5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate

Repeatable High Next planned downtime window

Minutes Volume migration time

6 Stop application and migrate data Repeatable High Coordinate wapp owner

Minutes Migration time

39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations

The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum

51 SAMPLE SETTING 1 REAL-LIFE SETTING

This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative

Figure 21) Storage to enable organic data growth between planned downtime windows

Data Data Growth

Planned Downtime Window

Planned Downtime Window

Months Time

Note Several months might fall between planned downtime windows to perform major mitigation alternatives

The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely

Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate

40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth

An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are

bull First metric Aggregate capacity used bull Second metric Aggregate space committed

Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space

Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space

Data Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used

Aggregate Space Committed

0ndash50 gt 65

0ndash110 gt 120

Provisioning New Storage Y

Capacity Assessment Adapt Thresholds

Mitigate

Y Y

Y

Provisioning New Storage Y

Assess Capacity Y

41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

52 SAMPLE SETTING 2 SETTLEDNOMAD

This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online

Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months

Settled

Detecting the Need to Act

Effect of Mitigation (eg migration)

Hours Time

N NN N N

In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on

the individual situation and is calculated against 100

The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases

Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative

Detection Threshold Notify Mitigation

gt 70 Storage operations Stop provisioning of storage

gt 85 Storage operations Stop extending provisioned storage

gt 90 Storage operations Relax resource situation and migrate nomad

42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used

Settled Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used 0ndash70 70ndash85 gt 90

Provisioning New Storage Y

Extending Already Provisioned Storage

Relax UtilizationmdashNetApp Data Motion a Nomad

Y Y

Y

N N N

You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors

43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe

Elapsed Time

Capacity

1 Month 3 Months

Committed Capacity

Overall Trend

Last 3-Month Trend

Capacity Used

1 2 3

As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows

1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window

2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment

3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)

44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not

exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect

b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives

c Determine the growth rate Operations Manager provides help in determining the trend of data growth

d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past

To provision storage following these steps

1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes

2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler

entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there

is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller

c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely

d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off

Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 35: Lun Provision

35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

43 NOTIFICATION

Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low

After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process

Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure

Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP

NOTIFY BY E-MAIL

An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42

NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager

NOTIFY BY SNMP

Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested

36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 20) Configuring an alarm based on the threshold aggregate almost full

Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there

NOTIFY BY SCRIPT

Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line

dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full

37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

44 MITIGATE STORAGE USE

Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor

Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary

When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness

MITIGATION ACTIVITIES FOR AGGREGATES

Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow

1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives

2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero

3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space

4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage

controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability

6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes

7 If none of the listed activities can be used the application must be stopped to achieve a consistent state

The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online

38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Table 8) Mitigation alternatives to control use within aggregates

No Mitigation Activity Repeatability SLA Impact Preparation Time

Time to Show Effect

1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits

None HW procurement

Immediate (+rebalancing)

Data ONTAP 8 High limits

2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate

3 Shrink other volumes in the aggregate if they have enough free space

One time Low None Immediate

4 Run deduplication and shrink volumes Repeatable Low

Time to execute dedupe

Immediate

5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time

6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh

Next planned downtime window

Minutes Volume switch-over time

7 Prevent application data loss and stop the application then migrate (offline)

Repeatable Lowndashhigh Coordinate with app owner

Minutes Migration time

MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS

Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity

Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)

Table 9) Mitigation activities for resource tightness within volumes

No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect

1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate

2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate

3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate

4 Activate FAS deduplication for the volume (requires proper space guarantees) One time

Lowpossible performance impact

Wait for schedule Hours

5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate

Repeatable High Next planned downtime window

Minutes Volume migration time

6 Stop application and migrate data Repeatable High Coordinate wapp owner

Minutes Migration time

39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations

The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum

51 SAMPLE SETTING 1 REAL-LIFE SETTING

This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative

Figure 21) Storage to enable organic data growth between planned downtime windows

Data Data Growth

Planned Downtime Window

Planned Downtime Window

Months Time

Note Several months might fall between planned downtime windows to perform major mitigation alternatives

The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely

Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate

40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth

An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are

bull First metric Aggregate capacity used bull Second metric Aggregate space committed

Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space

Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space

Data Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used

Aggregate Space Committed

0ndash50 gt 65

0ndash110 gt 120

Provisioning New Storage Y

Capacity Assessment Adapt Thresholds

Mitigate

Y Y

Y

Provisioning New Storage Y

Assess Capacity Y

41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

52 SAMPLE SETTING 2 SETTLEDNOMAD

This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online

Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months

Settled

Detecting the Need to Act

Effect of Mitigation (eg migration)

Hours Time

N NN N N

In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on

the individual situation and is calculated against 100

The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases

Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative

Detection Threshold Notify Mitigation

gt 70 Storage operations Stop provisioning of storage

gt 85 Storage operations Stop extending provisioned storage

gt 90 Storage operations Relax resource situation and migrate nomad

42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used

Settled Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used 0ndash70 70ndash85 gt 90

Provisioning New Storage Y

Extending Already Provisioned Storage

Relax UtilizationmdashNetApp Data Motion a Nomad

Y Y

Y

N N N

You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors

43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe

Elapsed Time

Capacity

1 Month 3 Months

Committed Capacity

Overall Trend

Last 3-Month Trend

Capacity Used

1 2 3

As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows

1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window

2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment

3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)

44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not

exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect

b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives

c Determine the growth rate Operations Manager provides help in determining the trend of data growth

d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past

To provision storage following these steps

1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes

2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler

entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there

is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller

c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely

d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off

Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 36: Lun Provision

36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 20) Configuring an alarm based on the threshold aggregate almost full

Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there

NOTIFY BY SCRIPT

Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line

dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full

37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

44 MITIGATE STORAGE USE

Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor

Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary

When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness

MITIGATION ACTIVITIES FOR AGGREGATES

Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow

1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives

2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero

3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space

4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage

controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability

6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes

7 If none of the listed activities can be used the application must be stopped to achieve a consistent state

The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online

38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Table 8) Mitigation alternatives to control use within aggregates

No Mitigation Activity Repeatability SLA Impact Preparation Time

Time to Show Effect

1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits

None HW procurement

Immediate (+rebalancing)

Data ONTAP 8 High limits

2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate

3 Shrink other volumes in the aggregate if they have enough free space

One time Low None Immediate

4 Run deduplication and shrink volumes Repeatable Low

Time to execute dedupe

Immediate

5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time

6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh

Next planned downtime window

Minutes Volume switch-over time

7 Prevent application data loss and stop the application then migrate (offline)

Repeatable Lowndashhigh Coordinate with app owner

Minutes Migration time

MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS

Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity

Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)

Table 9) Mitigation activities for resource tightness within volumes

No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect

1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate

2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate

3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate

4 Activate FAS deduplication for the volume (requires proper space guarantees) One time

Lowpossible performance impact

Wait for schedule Hours

5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate

Repeatable High Next planned downtime window

Minutes Volume migration time

6 Stop application and migrate data Repeatable High Coordinate wapp owner

Minutes Migration time

39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations

The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum

51 SAMPLE SETTING 1 REAL-LIFE SETTING

This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative

Figure 21) Storage to enable organic data growth between planned downtime windows

Data Data Growth

Planned Downtime Window

Planned Downtime Window

Months Time

Note Several months might fall between planned downtime windows to perform major mitigation alternatives

The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely

Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate

40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth

An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are

bull First metric Aggregate capacity used bull Second metric Aggregate space committed

Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space

Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space

Data Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used

Aggregate Space Committed

0ndash50 gt 65

0ndash110 gt 120

Provisioning New Storage Y

Capacity Assessment Adapt Thresholds

Mitigate

Y Y

Y

Provisioning New Storage Y

Assess Capacity Y

41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

52 SAMPLE SETTING 2 SETTLEDNOMAD

This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online

Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months

Settled

Detecting the Need to Act

Effect of Mitigation (eg migration)

Hours Time

N NN N N

In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on

the individual situation and is calculated against 100

The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases

Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative

Detection Threshold Notify Mitigation

gt 70 Storage operations Stop provisioning of storage

gt 85 Storage operations Stop extending provisioned storage

gt 90 Storage operations Relax resource situation and migrate nomad

42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used

Settled Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used 0ndash70 70ndash85 gt 90

Provisioning New Storage Y

Extending Already Provisioned Storage

Relax UtilizationmdashNetApp Data Motion a Nomad

Y Y

Y

N N N

You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors

43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe

Elapsed Time

Capacity

1 Month 3 Months

Committed Capacity

Overall Trend

Last 3-Month Trend

Capacity Used

1 2 3

As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows

1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window

2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment

3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)

44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not

exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect

b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives

c Determine the growth rate Operations Manager provides help in determining the trend of data growth

d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past

To provision storage following these steps

1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes

2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler

entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there

is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller

c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely

d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off

Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 37: Lun Provision

37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

44 MITIGATE STORAGE USE

Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor

Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary

When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness

MITIGATION ACTIVITIES FOR AGGREGATES

Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow

1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives

2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero

3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space

4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage

controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability

6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes

7 If none of the listed activities can be used the application must be stopped to achieve a consistent state

The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online

38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Table 8) Mitigation alternatives to control use within aggregates

No Mitigation Activity Repeatability SLA Impact Preparation Time

Time to Show Effect

1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits

None HW procurement

Immediate (+rebalancing)

Data ONTAP 8 High limits

2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate

3 Shrink other volumes in the aggregate if they have enough free space

One time Low None Immediate

4 Run deduplication and shrink volumes Repeatable Low

Time to execute dedupe

Immediate

5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time

6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh

Next planned downtime window

Minutes Volume switch-over time

7 Prevent application data loss and stop the application then migrate (offline)

Repeatable Lowndashhigh Coordinate with app owner

Minutes Migration time

MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS

Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity

Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)

Table 9) Mitigation activities for resource tightness within volumes

No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect

1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate

2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate

3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate

4 Activate FAS deduplication for the volume (requires proper space guarantees) One time

Lowpossible performance impact

Wait for schedule Hours

5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate

Repeatable High Next planned downtime window

Minutes Volume migration time

6 Stop application and migrate data Repeatable High Coordinate wapp owner

Minutes Migration time

39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations

The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum

51 SAMPLE SETTING 1 REAL-LIFE SETTING

This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative

Figure 21) Storage to enable organic data growth between planned downtime windows

Data Data Growth

Planned Downtime Window

Planned Downtime Window

Months Time

Note Several months might fall between planned downtime windows to perform major mitigation alternatives

The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely

Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate

40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth

An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are

bull First metric Aggregate capacity used bull Second metric Aggregate space committed

Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space

Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space

Data Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used

Aggregate Space Committed

0ndash50 gt 65

0ndash110 gt 120

Provisioning New Storage Y

Capacity Assessment Adapt Thresholds

Mitigate

Y Y

Y

Provisioning New Storage Y

Assess Capacity Y

41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

52 SAMPLE SETTING 2 SETTLEDNOMAD

This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online

Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months

Settled

Detecting the Need to Act

Effect of Mitigation (eg migration)

Hours Time

N NN N N

In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on

the individual situation and is calculated against 100

The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases

Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative

Detection Threshold Notify Mitigation

gt 70 Storage operations Stop provisioning of storage

gt 85 Storage operations Stop extending provisioned storage

gt 90 Storage operations Relax resource situation and migrate nomad

42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used

Settled Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used 0ndash70 70ndash85 gt 90

Provisioning New Storage Y

Extending Already Provisioned Storage

Relax UtilizationmdashNetApp Data Motion a Nomad

Y Y

Y

N N N

You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors

43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe

Elapsed Time

Capacity

1 Month 3 Months

Committed Capacity

Overall Trend

Last 3-Month Trend

Capacity Used

1 2 3

As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows

1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window

2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment

3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)

44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not

exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect

b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives

c Determine the growth rate Operations Manager provides help in determining the trend of data growth

d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past

To provision storage following these steps

1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes

2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler

entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there

is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller

c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely

d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off

Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 38: Lun Provision

38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Table 8) Mitigation alternatives to control use within aggregates

No Mitigation Activity Repeatability SLA Impact Preparation Time

Time to Show Effect

1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits

None HW procurement

Immediate (+rebalancing)

Data ONTAP 8 High limits

2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate

3 Shrink other volumes in the aggregate if they have enough free space

One time Low None Immediate

4 Run deduplication and shrink volumes Repeatable Low

Time to execute dedupe

Immediate

5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time

6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh

Next planned downtime window

Minutes Volume switch-over time

7 Prevent application data loss and stop the application then migrate (offline)

Repeatable Lowndashhigh Coordinate with app owner

Minutes Migration time

MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS

Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity

Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)

Table 9) Mitigation activities for resource tightness within volumes

No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect

1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate

2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate

3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate

4 Activate FAS deduplication for the volume (requires proper space guarantees) One time

Lowpossible performance impact

Wait for schedule Hours

5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate

Repeatable High Next planned downtime window

Minutes Volume migration time

6 Stop application and migrate data Repeatable High Coordinate wapp owner

Minutes Migration time

39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations

The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum

51 SAMPLE SETTING 1 REAL-LIFE SETTING

This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative

Figure 21) Storage to enable organic data growth between planned downtime windows

Data Data Growth

Planned Downtime Window

Planned Downtime Window

Months Time

Note Several months might fall between planned downtime windows to perform major mitigation alternatives

The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely

Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate

40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth

An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are

bull First metric Aggregate capacity used bull Second metric Aggregate space committed

Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space

Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space

Data Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used

Aggregate Space Committed

0ndash50 gt 65

0ndash110 gt 120

Provisioning New Storage Y

Capacity Assessment Adapt Thresholds

Mitigate

Y Y

Y

Provisioning New Storage Y

Assess Capacity Y

41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

52 SAMPLE SETTING 2 SETTLEDNOMAD

This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online

Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months

Settled

Detecting the Need to Act

Effect of Mitigation (eg migration)

Hours Time

N NN N N

In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on

the individual situation and is calculated against 100

The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases

Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative

Detection Threshold Notify Mitigation

gt 70 Storage operations Stop provisioning of storage

gt 85 Storage operations Stop extending provisioned storage

gt 90 Storage operations Relax resource situation and migrate nomad

42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used

Settled Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used 0ndash70 70ndash85 gt 90

Provisioning New Storage Y

Extending Already Provisioned Storage

Relax UtilizationmdashNetApp Data Motion a Nomad

Y Y

Y

N N N

You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors

43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe

Elapsed Time

Capacity

1 Month 3 Months

Committed Capacity

Overall Trend

Last 3-Month Trend

Capacity Used

1 2 3

As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows

1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window

2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment

3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)

44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not

exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect

b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives

c Determine the growth rate Operations Manager provides help in determining the trend of data growth

d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past

To provision storage following these steps

1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes

2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler

entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there

is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller

c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely

d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off

Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 39: Lun Provision

39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations

The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum

51 SAMPLE SETTING 1 REAL-LIFE SETTING

This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative

Figure 21) Storage to enable organic data growth between planned downtime windows

Data Data Growth

Planned Downtime Window

Planned Downtime Window

Months Time

Note Several months might fall between planned downtime windows to perform major mitigation alternatives

The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely

Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate

40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth

An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are

bull First metric Aggregate capacity used bull Second metric Aggregate space committed

Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space

Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space

Data Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used

Aggregate Space Committed

0ndash50 gt 65

0ndash110 gt 120

Provisioning New Storage Y

Capacity Assessment Adapt Thresholds

Mitigate

Y Y

Y

Provisioning New Storage Y

Assess Capacity Y

41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

52 SAMPLE SETTING 2 SETTLEDNOMAD

This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online

Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months

Settled

Detecting the Need to Act

Effect of Mitigation (eg migration)

Hours Time

N NN N N

In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on

the individual situation and is calculated against 100

The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases

Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative

Detection Threshold Notify Mitigation

gt 70 Storage operations Stop provisioning of storage

gt 85 Storage operations Stop extending provisioned storage

gt 90 Storage operations Relax resource situation and migrate nomad

42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used

Settled Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used 0ndash70 70ndash85 gt 90

Provisioning New Storage Y

Extending Already Provisioned Storage

Relax UtilizationmdashNetApp Data Motion a Nomad

Y Y

Y

N N N

You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors

43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe

Elapsed Time

Capacity

1 Month 3 Months

Committed Capacity

Overall Trend

Last 3-Month Trend

Capacity Used

1 2 3

As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows

1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window

2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment

3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)

44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not

exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect

b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives

c Determine the growth rate Operations Manager provides help in determining the trend of data growth

d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past

To provision storage following these steps

1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes

2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler

entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there

is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller

c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely

d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off

Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 40: Lun Provision

40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth

An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are

bull First metric Aggregate capacity used bull Second metric Aggregate space committed

Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space

Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space

Data Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used

Aggregate Space Committed

0ndash50 gt 65

0ndash110 gt 120

Provisioning New Storage Y

Capacity Assessment Adapt Thresholds

Mitigate

Y Y

Y

Provisioning New Storage Y

Assess Capacity Y

41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

52 SAMPLE SETTING 2 SETTLEDNOMAD

This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online

Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months

Settled

Detecting the Need to Act

Effect of Mitigation (eg migration)

Hours Time

N NN N N

In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on

the individual situation and is calculated against 100

The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases

Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative

Detection Threshold Notify Mitigation

gt 70 Storage operations Stop provisioning of storage

gt 85 Storage operations Stop extending provisioned storage

gt 90 Storage operations Relax resource situation and migrate nomad

42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used

Settled Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used 0ndash70 70ndash85 gt 90

Provisioning New Storage Y

Extending Already Provisioned Storage

Relax UtilizationmdashNetApp Data Motion a Nomad

Y Y

Y

N N N

You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors

43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe

Elapsed Time

Capacity

1 Month 3 Months

Committed Capacity

Overall Trend

Last 3-Month Trend

Capacity Used

1 2 3

As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows

1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window

2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment

3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)

44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not

exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect

b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives

c Determine the growth rate Operations Manager provides help in determining the trend of data growth

d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past

To provision storage following these steps

1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes

2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler

entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there

is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller

c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely

d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off

Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 41: Lun Provision

41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

52 SAMPLE SETTING 2 SETTLEDNOMAD

This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online

Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months

Settled

Detecting the Need to Act

Effect of Mitigation (eg migration)

Hours Time

N NN N N

In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment

bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used

bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on

the individual situation and is calculated against 100

The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases

Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative

Detection Threshold Notify Mitigation

gt 70 Storage operations Stop provisioning of storage

gt 85 Storage operations Stop extending provisioned storage

gt 90 Storage operations Relax resource situation and migrate nomad

42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used

Settled Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used 0ndash70 70ndash85 gt 90

Provisioning New Storage Y

Extending Already Provisioned Storage

Relax UtilizationmdashNetApp Data Motion a Nomad

Y Y

Y

N N N

You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors

43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe

Elapsed Time

Capacity

1 Month 3 Months

Committed Capacity

Overall Trend

Last 3-Month Trend

Capacity Used

1 2 3

As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows

1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window

2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment

3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)

44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not

exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect

b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives

c Determine the growth rate Operations Manager provides help in determining the trend of data growth

d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past

To provision storage following these steps

1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes

2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler

entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there

is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller

c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely

d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off

Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 42: Lun Provision

42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Figure 24) Visualization of phase transitions depending on metric aggregate capacity used

Settled Data GrowthAggregate Capacity

Operational Sweet Spot Corridor

Aggregate Capacity Used 0ndash70 70ndash85 gt 90

Provisioning New Storage Y

Extending Already Provisioned Storage

Relax UtilizationmdashNetApp Data Motion a Nomad

Y Y

Y

N N N

You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors

43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe

Elapsed Time

Capacity

1 Month 3 Months

Committed Capacity

Overall Trend

Last 3-Month Trend

Capacity Used

1 2 3

As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows

1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window

2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment

3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)

44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not

exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect

b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives

c Determine the growth rate Operations Manager provides help in determining the trend of data growth

d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past

To provision storage following these steps

1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes

2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler

entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there

is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller

c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely

d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off

Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 43: Lun Provision

43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies

Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe

Elapsed Time

Capacity

1 Month 3 Months

Committed Capacity

Overall Trend

Last 3-Month Trend

Capacity Used

1 2 3

As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows

1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window

2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment

3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)

44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not

exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect

b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives

c Determine the growth rate Operations Manager provides help in determining the trend of data growth

d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past

To provision storage following these steps

1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes

2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler

entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there

is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller

c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely

d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off

Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 44: Lun Provision

44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not

exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect

b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives

c Determine the growth rate Operations Manager provides help in determining the trend of data growth

d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past

To provision storage following these steps

1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes

2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler

entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there

is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller

c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely

d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off

Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 45: Lun Provision

45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable

Use the following command sequence to configure zero fat for SAN environments with autodelete set to on

vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable

e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated

f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative

g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days

to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 46: Lun Provision

46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo

wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo

wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices

Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html

bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html

bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html

bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html

bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html

bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS
Page 47: Lun Provision

47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang

NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document

copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010

  • EXECUTIVE SUMMARY
  • INTRODUCTION
    • 21 TERMINOLOGY
    • 22 GOAL OF THIS DOCUMENT
    • 23 AUDIENCE
    • 24 SCENARIO
    • 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
      • PROVISIONING
        • 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
        • 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
        • 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
          • OPERATION
            • 41 PHASES AND TRANSITIONS
            • 42 MONITORING
            • 43 NOTIFICATION
            • 44 MITIGATE STORAGE USE
              • REAL-LIFE SETTINGS
                • 51 SAMPLE SETTING 1 REAL-LIFE SETTING
                • 52 SAMPLE SETTING 2 SETTLEDNOMAD
                  • STORAGE EFFICIENCY COOKBOOK
                  • REFERENCES
                  • ACKNOWLEDGMENTS